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Preface 


The 6th Biennial International Conference for Mathematics and Computation in Music 
(MCM 2017) took place during June 26-29, 2017, at the Faculty of Sciences of the 
Universidad Nacional Autonoma de Mexico, in Mexico City, Mexico. Additional 
venues for recitals were kindly provided by the Escuela Superior de Musica and the 
Museo Nacional de Historia “Castillo de Chapultepec”. 

As the flagship conference of the Society for Mathematics and Computation in 
Music (SMCM), MCM 2017 provided a dedicated platform for the communication and 
exchange of ideas among researchers in mathematics, informatics, music theory, 
composition, musicology, and related disciplines. It brought together researchers from 
around the world who combine mathematics or computation with music theory, music 
analysis, composition, and performance. 

The program is available at http://www.mcm2017.org and featured three plenary 
lectures: the first by Guerino Mazzola (who introduced the musical mathematical game 
as a complement to the mathematical music theory ), the second by Harald Fripertinger 
(who spoke on the combinatorics of tone-rows and their role in music), and a final one 
by Julio Estrada (who described some of the mathematical tools and inspirations that 
underlie his compositions). 

There was a panel titled “Contemporary Music Composition in Relation to Math¬ 
ematics and Computing: Current Perspectives and Approaches”, with the participation 
of four Mexican composers: Juan Sebastian Lach (Conservatorio de las Rosas), 
Roberto Morales-Manzanares (Universidad de Guanajuato), Gabriel Pareyon (CEN- 
IDIM), and Edmar Soria, which was a valuable firsthand testimony of the fruitful 
interplay of the mathematical and computational approaches in the creation of music. 

During three daily one-hour sessions, Octavio Alberto Agustin-Aquino (a member 
of the Organizing and Scientific Program Committee) delivered a “nano-course” on 
Guerino Mazzola’s mathematical music theory, whose intent was to serve both as an 
homage to the current SMCM president and to introduce to an audience as wide as 
possible the techniques, results, and philosophical postures contained in Mazzola’s 
work, with an emphasis on counterpoint. 

The program included three evening recitals. The first was a guitar recital by 
Octavio Alberto Agustin-Aquino, who visited musical landscapes from eight countries 
(four from Europe and four from America) in a travelling salesman route, while 
keeping the proportion of the durations in approximate correspondence to that of the 
landmasses of the continents. The second one was a free jazz recital by Heinz Geisser 
(drums) and the president of the SMCM, Guerino Mazzola (piano), in the auditorium 
of the Escuela Superior de Musica, which constituted an electrifying dialogue of 
gestures and mutual spaces of performance. The third one was performed by Harald 
Fripertinger (flute) and the head of the Organizing and Scientific Program Committee, 
Emilio Lluis-Puebla (piano), featuring music from Telemann, Beethoven, Schubert, 
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Preface 


Gant, Marsh, and Thomas, in the wonderful environment of the Alcazar of the Museo 
Nacional de Historia “Castillo de Chapultepec”, as a pure enjoyment of music, 
mathematics, and life. 

The chapters in these proceedings correspond to the papers and two selected posters 
presented at the conference, following a careful peer-review process, which was 
optionally double-blind. We received 40 submissions from 62 authors across 10 
countries. Each submission was assigned to one or two reviewers. A paper was 
accepted if and only if the recommendation of the reviewers was positive and a 
majority of the editors judged it a meritorious contribution; sometimes it required a 
second round of revisions. A total of 28 papers were accepted following review. 

Last, but not least, we thank the following institutions for providing their infras¬ 
tructure and human resources for the organization and promotion of MCM 2017: 

- Facultad de Ciencias de la Universidad Nacional Autonoma de Mexico 

- Society for Mathematics and Computation in Music 

- Escuela Superior de Musica 

- Georgia State University 

- Museo Nacional de Historia 

- Sociedad Matematica Mexicana 

- Universidad de la Canada 

- Universidad Tecnologica de la Mixteca 

July 2017 Octavio A. Agustin-Aquino 

Emilio Lluis-Puebla 
Mariana Montiel 
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Primal-Circular Substitutions 


Marcus D. Booth 1,2(E1) 

1 San Diego State University, San Diego, USA 
2 Tulane University, New Orleans, USA 
mboothl@tulane.edu 


Abstract. There are two ongoing tensions in the creation of new musical 
systems that traditional innovation procedures in the domain of harmony have 
acknowledged yet left under-explored in the systematic sense from the com¬ 
poser’s perspective. Learned terminological constraints and reactionary creative 
practices often limit the realization of underlying creative logic that connects 
past influences to present innovations, and creative procedures on one structural 
unit of a particular musical element to those on another. The following paper 
will proceed by example from theory toward a compositional end, employing 
algebraic techniques to create a system of chord substitution which serves as an 
exemplary solution of the aforementioned issues. Primal-circular substitution 
has a basis in western tertian harmony, shows compatibility with 
neo-Riemannian local transformation sequences, and re-envisions substitutions 
as the realization of globally applicable systems. 

Keywords: Chord substitution • Re-harmonization system 
Composition practices 


1 Introduction 

Two major themes have long pervaded music composition across cultural, chrono¬ 
logical, and creative boundaries. One is the tendency of creative progress to be of a 
reactionary rather than systematic nature, and involve a dichotomous choice between 
slightly editing or wholly rejecting a past creative system. Initially, a composer learns 
the traditions of the nearby and recent, as a social being learns a language in prepa¬ 
ration for the task of being effective socially. Then he or she reacts to these in the 
process of incorporating them into the worldview that contextualizes the compositional 
products. A decision is made about how effective or not these traditional approaches 
are, and action is taken over time to increase the efficacy of the compositional tools and 
transmit the composer’s message. 

The second theme at play in the compositional process is consideration of the 
interplay between local and global musical structures. Focusing on either, the composer 
must decide which combinations of one or more musical elements (e.g. melody, har¬ 
mony, timing, timbre, etc...) most strongly serve the overall purpose of the work (or 
some other large formal construct), and which large scale musical forms and techniques 
will constrain the treatment of these elements. The orientation described in the first 
theme gives rise to particular tendencies in this decision making process and becomes 


© Springer International Publishing AG 2017 
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the starting point for implementation. The composer’s palette is governed by the 
evolution of a language and its implied “conditions of possibility” [3]. Over the course 
of composing a work, making a hierarchy of decisions of the aforementioned nature, 
the composer intuitively creates a system of patterns of which he or she is only partially 
conscious in the moment. When one level of patterned abstraction is being acted upon 
creatively, another is unfolding logically without agency. 

In what follows, a method of chord substitution will be introduced in attempt to 
provide an example of thinking about harmony and altering traditional chord pro¬ 
gressions that overcomes the epistemological limitations previously described. The 
approach, called primal-circular substitution, makes use of cyclical quotient groups in 
Z 12 to explain mutually dependent interval circularities and the harmonies they relate 
for use in re-harmonization procedures. 


2 Aesthetic and Philosophical Concerns 

While aesthetic judgments in the absolute sense may be left to the individual and the 
situational, in the case of employing mathematics to compositional ends, a few words 
are needed in explanation and defense of the approach. Without this, we are left with 
the trivial case that any finite collection of objects, musical or not, can be placed in an 
organizational and axiomatic framework that appears mathematically sound because it 
was initially designed to be so. While not sufficient to justify a new system, it is 
certainly necessary to view this extreme not as a potential derailment of the theory, but 
as an indication that the astounding variety of organizational structures that mathe¬ 
matics explains make mathematical thinking central to human intuition [1]. In the case 
of primal-circular substitution, this defense then has two parts. 

The first is that the theory is not a freely generative approach to harmony, but one 
with existing musical anchors. It is a theory of substitution which uses prime number 
patterns to catalogue similarities between chords and allow substitution of part or all of 
a known chord progression. Regarding commonly accepted features of tonality, the 
example of the process soon to be introduced is centered on circularity and western 
tertian notions of consonance, maintains chord shapes and distances [4], and introduces 
a new scope of underlying structural similarity that re-envisions sonic relationships 
known to work aesthetically for audiences of the past and present. This maintains the 
composer’s opportunity to manipulate emotions rather than just logically connected 
patterns. 

The second point worthy of mention is that mathematics provides a language for 
relating formal components of different types of inquiry about the world, and is not 
itself responsible for cases in which the user does not choose to do so. Again, this is not 
an argument restrictively in favor of a particular teleological position on the use of 
mathematics in music. It is safe to say though, that representationalism has been a key 
mode of creative endeavor in music from inspirations acknowledged to form and 
content decided upon by composers, so it provides one solid base for defending the 
widespread relevance of the quantitative approach. With mathematics being largely a 
study of structures and processes independent of content that may conform to its 
particular models, its use in connecting music to other human endeavors and thought 
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patterns knows few boundaries. Beyond its representational potential by way of 
mapping relations, it also facilitates specific instances of the reductionist thought that 
has been pervasive across different types of scientific inquiry at least sense the Age of 
Enlightenment [1]. From Chemillier’s “ethnomathematics” to Xenakis’ inspiration 
drawn from architecture and Cage’s interests in chance and choice, many exemplary 
connections between thought structure and medium-independent creative output can be 
found which invite mathematical scrutiny as described. 

Now that the long surviving relevance of mathematics to time and efficacy tested 
aesthetic practices has been summarized, let us move back to the plan to introduce the 
primal-circular substitution framework. In the next section, we will begin by defining 
the basic tools for creating one version of the system from the starting point of accepted 
tonal principles: interval cycles (in all such systems) and major/minor chord structures 
(this version). Following that, we will advance our conceptualization to the codepen¬ 
dency of intervallic cycles, populating the chord substitution charts in the process so 
they are ready for compositional use. In doing this, mathematical generalizations will 
become apparent and proofs of those will be offered. Finally we will show the appli¬ 
cation of the system in an especially restrictive formal context: the mapping of one 
Hamiltonian progression (typically generated by P, F, and R operations) to another, 
connecting the local (neo-Riemannian, chord to chord) to the global (primal-circular 
substitution systems, chord/progression to superstructure) with mathematical continuity. 

3 Primal-Circular Substitutions 

We will begin the example construction of primal-circular systems by noting some 
behaviors of single intervals, as chord types as defined in the western European tra¬ 
dition contain predefined subsets of the intervallic possibilities. The reader is assumed 
to be familiar with determining basic cycling of intervals, so this part need not be 
directly illustrated (numerical results are given exhaustively in Table 1). Interval 
cycling is isomorphic to dividing a whole number of equal temperament octaves into 
equal parts, or multiplying an interval out until it finishes one complete cycle through 
distinct pitch classes. So to go from the culturally dependent notion of interval names to 
something that can be extended free of terminological needs, we proceed to a math¬ 
ematical explanation. In this case, to allow both relative and absolute interpretation of 
the system, we will move from consideration of the interval to consideration of the 
pitch class numbers that arise. The table below shows the behavior of each pitch class n 
when cycled multiplicatively by the interval M and adjusted modulo 12 for labeling. 

The generating formula for the chart below is the familiar: 

Pitch Category = Mn mod 12 

In the body of the table above, one can observe the mod cycles for a given 
multiplier horizontally. Tertian chords then, correspond to groupings of three elements. 
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Table 1 . Catalogue of mod cycles: Note numbers n are presented horizontally and multipliers 
M are catalogued vertically, with the value of the above equation in each interior box 


n —¥ 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

1 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

2 

2 

4 

6 

8 

10 

0 

2 

4 

6 

8 

10 

0 

3 

3 

6 

9 

0 

3 

6 

9 

0 

3 

6 

9 

0 

4 

4 

8 

0 

4 

8 

0 

4 

8 

0 

4 

8 

0 

5 

5 

10 

3 

8 

1 

6 

11 

4 

9 

2 

7 

0 

6 

6 

0 

6 

0 

6 

0 

6 

0 

6 

0 

6 

0 

7 

7 

2 

9 

4 

11 

6 

1 

8 

3 

10 

5 

0 

8 

8 

4 

0 

8 

4 

0 

8 

4 

0 

8 

4 

0 

9 

9 

6 

3 

0 

9 

6 

3 

0 

9 

6 

3 

0 

10 

10 

8 

6 

4 

2 

0 

10 

8 

6 

4 

2 

0 

11 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 


For a root pitch p, the following sets represent major and minor chords: 

Major chord = (/?,/? +4, p + 7) 

Minor chord = (p,p + 3,p + 7) 

Multiplication example: 

4 x C major = 4 x (1, 5, 8) mod 12 = (4, 8, 8) 

8 x C major = 8 x (1, 5, 8) mod 12 = (4, 4, 8) 

So C Major is type 448/488 then. 

Categorizations can then be visualized in the chart above, but for ease of use in 
compositional practices or experimentation with the theory, they will be organized in a 
table below with by category labels and chord names before laying out the mathe¬ 
matical reasoning explicitly and discussing the codependency of the three intervals 
present in a tertian chord. Before we do this, note the following two necessary 
abstractions: 

1. Because this is a theory of chord substitution, which is an operation done on chord 
progressions rather than on specific voice leadings, permutation of voices is 
abstracted away by always ordering pitch class numbers from least to greatest after 
each operation in modulo 12. And... 

2. Because the substitution approach is based on comparisons of prime factors of 
M with prime factors of 12, considering that the quotient groups Z 12 !MZ 12 and 
Z 12 /(12-M)Z 12 are isomorphic, we categorize chords according to results of multi¬ 
plication by M and 12-M together, since the residue classes are re-orderings, which 
we said above we abstract away. 
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The names of the chord types in the charts below correspond to the positions of 
their individual tones in the mod cycles for M and 12-M. Four steps are now given, 
explaining how to use the chord charts: 

1. Write a traditional western tonal chord progression, e.g. I-vi-IV-V-I in C major: 

C Am F G C 

2. Choose the 3 x 8 chart or the 4 x 6 chart (or other examples you generate via the 
mathematics) below, (for this example we will use the 3 x 8 chart) 

3. Find the category (column) of each chord by locating it in the chart: 

C Am F G C = 448/488,448/488, 044/088, 004/008,448/488. 

4. Write a new progression using chords from each corresponding category (Tables 2 
and 3). 


Example : Cm Eb Ab Fm C. 


Table 2. 3 Substitution Categories of 8 Chords Each: M = 4, 12-M = 8. 


Type 448/488 

Type 004/008 

Type 044/088 

C Major 

C#/Db Major 

C#/Db Minor 

C Minor 

D Minor 

D Major 

D#/Eb Major 

E Major 

E Minor 

D#/Eb Minor 

F Minor 

F Major 

F#/Gb Major 

G Major 

G Minor 

F#/Gb Minor 

G#/Ab Minor 

G#/Ab Major 

A Major 

A#/Bb Major 

A#/Bb Minor 

A Minor 

B Minor 

B Major 


Table 3. 4 substitution categories of 6 chords each: M = 3, 12-M = 9. 


C Major 

C#/Db Major 

C#/Db Minor 

C Minor 

D#/Eb Minor 

D Minor 

D Major 

D#/Eb Major 

E Major 

F Major 

F Minor 

E Minor 

G Minor 

F#/Gb Minor 

F#/Gb Major 

G Major 

G#/Ab Major 

A Major 

A Minor 

G#/Ab Minor 

B Minor 

A#/Bb Minor 

Bb Major 

B Major 


Any values of M on the interval [1, 12] can be used to generate systems of harmony 
around points of symmetry dividing the octave as this procedure does. The particular 
choices for the examples above were made because of their connection to common 
rhythmic/metric divisions of 3 and 4 as well as observations by the author of their 
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relationship to neo-Riemannian transformations. This in part serves the intent 
expressed earlier to show that this system aligns with certain traditional features of 
western music and common creative treatments of other musical elements. 
Neo-Riemannian transformations were originally devised as an explanation for 
movement from individual chord to chord, the local [2]. Primal-circular substitution 
theory connects them to the global. This will be demonstrated following some further 
mathematical explanations of how the chord charts above are determined. 

Briefly outlined earlier, chords are a set of fixed intervallic relationships that we 
then map onto two of the M systems in the Table 1). The M system takes each 
circularity, i.e. that of the root, third, and fifth of the chord, and cycles them through the 
remainder classes. Depending upon the value of M and the position of each chord tone 
relative to the points of symmetry defined by cycling M itself starting from pitch class 0 
and ending at the first pitch p such that the product of M and p is congruent to 0 mod 
12, two of the three chord tones with distinct behaviors will define the category. One of 
these tones is always the root, simply because that is how the positions are defined and 
the chord is named. This leaves either the 3rd or the 5th of the chord irrelevant to the 
categorization depending on the mapping (with p as root) of p + 3 (for the third of 
minor chords), or p + 4 (for major chords) and p + 7 (the fifth) to the corresponding 
positions in the cycle of Mp mod 12. A proof of this is given below. 

First, note that: 

# of substitution categories = length of mod cycle = 12/GCF(12, M) 

This follows from the concept of circularity. Because the relative positions p, 
p + (3 or 4), and p + 7 are fixed, as soon as one of the chord tones lands on an Mp such 
that Mp mod 12 = 0, the cycle is complete. Referring to the Table 1), we see the 
following regarding the particular substitution sets we will continue to discuss 
(Tables 4 and 5): 


Table 4. Number of categories for various M values. 


M Value 

# of Categories (12/GCF(12, M) 

3 

4 

4 

3 

5 

3 

6 

4 


Table 5. Number of chords per category for various M values. 


M Value 

# of Categories 

# of Chords in Category 

3 

4 

6 

4 

3 

8 

8 

3 

6 

9 

4 

8 
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Now we are prepared to prove that either the third or the fifth of the chord is 
irrelevant to its categorization. Cases for multiple M values will be presented as the 
point applies to each. 

Consider again the general major and minor chord structures: 

Major : [p, p + 4, p + 7] Minor : [p, p + 3,p + 7] 

Given the property: 


(jc + y) rnod M — x mod M + y mod M 

Proofs that the 3 rd or the 5 th of any chord is irrelevant for categorization purposes: 

([p, p + 4, p + 7] || [p, p + 3, p + 7]) mod (Sizeof Category: [2,3,4,6, or 12]) 

Now in order to map them onto the mod cycles for different M values, consider the 
following: 

Mod 2 Example (corresponding to M = 6) 

So for a major or minor chord in root position, you have 3 tones with only 2 distinct 
results modulo 2 

(p + 3) mod 2 = (p + l)mod 2 (p + (3 || 7)) mod 2 = (( pmod 2) + 1 )mod 2 

3 mod 2 = 7 mod 2 
4 mod 2 = 0 mod 2 => 

Depending on the root, either 3 rd = 5 th or Root = 3 rd . 

Mod 3 Example (corresponding to M = 4 || M = 8) 


(p + 3) mod 3 = p mod 3 
(p + 7) mod 3 = p mod 3 + 1 
(p + 4) mod 3 = p mod 3 + 1 
(p + 4) mod 3 = (p + 7) mod 3 

Again for a major or minor chord in root position, you have 3 tones with only two 
distinct results modulo 3. 


3 mod 3 = 0 


4 mod 3 = 7 mod 3 


Depending on the root, either 3 rd = 5 th or root = 3 rd 
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Mod 4 Example (corresponding to M = 3 || M = 9) 

(p + 3) mod 4 = p mod 4 + 3 
{p + 7) mod 4 = p mod 4 + 3 
{p + 4) mod 4 = p mod 4 
(p + 3) mod 4 = (p + l)mod 4 

So for a major or minor chord in root position, you have 3 tones with only 2 distinct 
results modulo 4. 

4 mod 4 = 0 mod 4 
3 mod 4 = 7 mod 4 =+ 


Depending on the root, either 3rd = 5th or root = 3rd. 

Now that the system has been explained mathematically and the construction has 
been shown, we conclude by showing musical examples on the next page, each of 
which satisfies the intentions described at the outset. Here, primal-circular substitution 
is used to alter local harmonies while satisfying the global choice to preserve a par¬ 
ticularly restrictive form. For this purpose, the progression chosen for re-harmonization 
is Moreno Andreatta’s Hamiltonian progression entitled “Aprile”, from which new 
Hamiltonian progressions are generated. 

Note: “Aprile” was composed using P, L, and R transformations. Primal-circular 
substitution systems can be understood locally in this way as detailed in Appendix B. 

3 x 8 (M = 4 or 8) random selection re-harmonization of “Aprile” 




-*-p 

Pi + ff 

: «F T f I 



"fr 

:S b -p f r jjg 

>■ f- Ilf- \>m 


mm +*f | 

T r t n 


ir t f 1 

$ f f m 

r 


Random Chord Selections = 183765243718561452436827 *1 

Progression (category, chord #) = (2,1) (3,8) (1,3) (1,7) (3,6) (2,5) (2,2) (3,4) (2,3) 
(3,7) (1,1) (1,8) (1,5) (1,6) (3,1) (2,4) (3,5) (1,2) (1,4) (3,3) (2,6) (2,8) (3,2) (2,7) *(2,1) 

Progression (chord names) = Db major, B major, Eb major, A major, Ab major, G 
major, D minor, F major, E major, Bb minor, C major, A minor, Gb major, F# minor, C# 
minor, F minor, G minor, C minor, Eb minor, E minor, G# minor, B minor, D major, Bb 
major, *Db major 
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4 x 6 (M = 3 or 9) random selection 


[4 3 j j 1 

»):|r Jtf A 

■14 4 4 14 

tfJ J bp 2 

tfg . f J ^ 

J- d 

=*f4= 

-p— 

3 5 . b 3 i 

■1 A M 

d — 

■V— 

11 -1 4 * r r J 

19 

1=F 

' T ' ' ' 

j b J J 

r $ 


J bJ b J : 

T—^ 

j-w— i —y— 

Ml. 

i -^ 

J M 1,3 

id —n 

w * - - |i 

bp 


Random Chord Selections = 6361534556132312645212446 

Progression (category, chord) = (3,6), (1,3), (4,6), (1,1), (4,5), (4,3), (1,4), (3,5), (2,5), 
(2,6), (3,1), (2,3), (2,2), (3,3), (2,1), (3,2), (1,6), (4,4), (1,5), (4,2), (4,1), (1,2), (3,4), (2,4), 
*(3,6) 

Progression (chord names) = Bb major, E major, B major, C major, G# minor, E minor, 
G minor, A minor, A major, Bb minor, C# minor, F major, D minor, F minor, Db major, D 
major, B minor, G major, Ab major, Eb major, C minor, D# minor, Gb major, F# minor, 

*Bb major 

With an extreme example of primal-circular substitution (for M = 3 and M = 4) in 
western harmonic context shown above, further experimentation on shorter and more 
conventional progressions is now left to the readers and composers. This approach to 
harmonic substitution can now be performed on aesthetically grounded materials 
according to the user, to novel creative ends serving his or her particular purpose at any 
harmonic structural level. 


Appendix A: Primal-Circular Substitution Charts (Reproductions) 

See Tables 6 and 7 


Table 6. M = 4, 12-M = 8. 


Type 448/488 

Type 004/008 

Type 044/088 

C Major 

C#/Db Major 

C#/Db Minor 

C Minor 

D Minor 

D Major 

D#/Eb Major 

E Major 

E Minor 

D#/Eb Minor 

F Minor 

F Major 

F#/Gb Major 

G Major 

G Minor 

F#/Gb Minor 

G#/Ab Minor 

G#/Ab Major 

A Major 

A#/Bb Major 

A#/Bb Minor 

A Minor 

B Minor 

B Major 
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Table 7. M = 3, 12-M = 9. 


C Major 

C#/Db Major 

C#/Db Minor 

C Minor 

D#/Eb Minor 

D Minor 

D Major 

D#/Eb Major 

E Major 

F Major 

F Minor 

E Minor 

G Minor 

F#/Gb Minor 

F#/Gb Major 

G Major 

G#/Ab Major 

A Major 

A Minor 

G#/Ab Minor 

B Minor 

A#/Bb Minor 

Bb Major 

B Major 


Appendix B: Neo-Riemannian Analogues of Primal-Circular 
Substitutions 

Assumption: Use each transformation (P, L, R) either 0 or 1 times. 

Note: The term “preserves” is used below to indicate that a particular transformation 
on a chord in a particular category yields another chord in that category, whereas other 
transformations would yield chords outside the category. 

Summary of Neo-Riemannian Results 

• The category 448/488 preserves: P || R in any order and place, no L. 

• The category 004/008 preserves: P & R consecutively in any order and place, L 

alone. 

• The category 044/088 preserves P || R first && P || R last. 

• The category 033/099 preserves P jj L first && P || L last. 

• The category 366/669 preserves L & P consecutively in any order/place, R alone. 

• The category 336/699 preserves P || L first && P || L last. 

• The category 003/009 preserves L & P consecutively in any order/place, R alone. 

Examples for clarification (use charts to verify): 

Ex. 1: Transformation of a chord in 448/488 by PR or RP yields another chord in 
44/.488. 

Ex. 2: Transformation of a chord in 004/008 by L alone yields another chord in 
004/008. So does transformation of a chord in 004/008 by PR or RP. 

Ex. 3: Transformation of a chord in 044/088 by PR, RP, PLR, or RLP yields 
another chord in 044/088. 
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Abstract. This paper presents a generalization of the well-known neo- 
Riemannian group PLR to the classical five types of seventh chord (dom¬ 
inant, minor, half-diminished, major, diminished) considered as tetra- 
chords with a marked root and proving that it is isomorphic to the 
abstract group S 5 x Zf 2 - This group includes as subgroups the PLR 
group and several other groups already appeared in the literature. 


Keywords: Transformational theory • neo-Riemannian group 
Semi-direct product • Seventh chord 


1 Introduction 

Since the pioneering works by David Lewin [8,9] and Guerino Mazzola [10,11], 
the main idea of transformational theory is to model musical transformations 
using algebraic structures. The most famous example is probably the neo- 
Riemannian group PLR , that acts on the set of all 24 minor and major triads 
of twelve-tone equal temperament and is abstractly isomorphic to the dihedral 
group of order 24. It is generated by the P, L and R operations that transform 
major triads in minor triads (and vice versa) shifting a single note by a semitone 
or a whole tone. They were introduced by 19th-century music theorist Hugo Rie- 
mann [12] for pure intervals. Lewin rediscovered the PLR operations, defined 
them considering the equal temperament, and gave birth to a branch of the 
transformational theory called neo-Riemannian theory. 

Neo-Riemannian transformations can be modelled with several geometric 
structures, of which the most important is the Tonnetz, first introduced by Euler 
[4] and later studied by the several musicologists of the 19th century, such as 
Wilhelm Moritz Drobisch, Carl Ernst Naumann, Arthur von Oettingen and the 
same Hugo Riemann. From a mathematical point of view the Tonnetz is an 
infinite two-dimensional simplicial complex which tiles the plane with triangles 
where 0-simplices represent pitch classes, and 2-simplices identify major and 
minor triads: the relative position of 2-simplices makes it also a natural tool in 
the theory of parsimonious voice leading. 

© Springer International Publishing AG 2017 

O. A. Agustfn-Aquino et al. (Eds.): MCM 2017, LNAI 10527, pp. 13-25, 2017. 
https://doi.org/10.1007/978-3-319-71827-9_2 
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In addition to triads, seventh chords are often used in the music literature. 
A natural question arises: can we define a group similar to the neo-Riemannian 
group PLR acting on the set of seventh chords (of the twelve-tone equal tem¬ 
perament)? More precisely: can we define a group of transformations between 
seventh chords to describe parsimonious voice leading, so that the generators fix 
three notes and move a single note by a semitone or a whole tone? Problems on 
relationships between seventh chords were studied by Childs [ 2 ], Gollin [ 6 ], by 
Fiore and Satyendra [5], by Arnett and Barth [ 1 ] and by Kerkez [?] for some of 
the types of seventh chords. In this paper we will extend their studies consider¬ 
ing all five “classical” types of seventh chords: dominant, minor, half-diminished, 
major, diminished. 

In Sect. 2 we provide some preliminaries about the neo-Riemannian group. 
Section 3 presents briefly the known results about the generalization of the PLR- 
group to seventh chords, and a classification of all transformations between sev¬ 
enth chords shifting a single note by a semitone or a whole tone. In the fourth 
and final section we will define the PLRQ group, generalizing the PLR group, 
and we identify its abstract algebraic structure. 

2 The neo-Riemannian Group PLR 

The neo-Riemannian group PLR is generated by the following P, L and R 
operations. 

- P (“Parallel”): if the triad is major, P moves the third down a semitone, 
while if the triad is minor P moves the third up a semitone. 

- L (“Leading-Tone”): if the triad is major L moves the root down a semitone, 
while if the triad is minor L moves the fifth up a semitone. 

- R (“Relative”): if the triad is major, R moves the fifth up a whole tone, while 
if the triad is minor R moves the root down a whole tone. 

There exist many ways to represent algebraically or geometrically such trans¬ 
formations. We will denote pitch classes by elements of the cyclic group of 12 
elements Z/12Z (or, more briefly, Z 12 ) and n-chords by n-ples of pitch classes in 
brackets, ordered in the ascending direction (as induced by the linear order of 
pitches) and starting from some pitch class of reference. 

In this notation, Crans, Fiore and Satyendra [3] use twelve equally-spaced 
points on a circle to represent pitch classes and relate the above operations to an 
inversion operation Ik+h as follows. Let S be the set of all 24 minor and major 
triads {[aq, x 2 , x 3 \ \ aq, x 2 , x 3 G Z i2 , x 2 = aq + 3 or x 2 = aq + 4, aq = aq + 7}; 
then 


P{[x l,x 2 ,x 3 ]) = I xi+X3 ([x 1 ,x 2 ,x 3 }) 
R([ Xl,X2,X 3 ]) = I Xl+X2 ([x % ,X 2 , 2 : 3 ]) 

L([ Xi,x 2 ,x 3 \) = I x *+x s {[xi,X 2 ,xa\) 


( 1 ) 

( 2 ) 

(3) 
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Fig. 1 . P(C) = c Fig. 2. R(C) = a Fig. 3. L(C) = e 


where Ik+h is the reflection of the circle across the axis of the line passing through 
k and h. As depicted in Figs. 1, 2 and 3, when applied to the triad of C major 
P gives c (C-minor), R gives a (A-minor) and L gives e (P-minor). 


Another way to define the P, L and R operations is proposed by Arnett 

and 

Barth [1]: 



P: M m 

P: [x, x + 4, x + 7] [x, x + 3, x + 7] 

( 4 ) 

R : M ^ m — 3 

R : [x, x + 4, x + 7] [x, x + 4, x + 9] 

( 5 ) 

L : M m + 4 

L : [x,x + 4,x + 7] [x — l,x + 4, x + 7] 

(6) 


where M represents a major triad, m a minor one, —3 and 4 are the numbers of 
semitones to be added to each component of the parallel triad (where here, as 
in the whole paper, the sum is made mod 12, i.e. in the group Z 12 ). 

It is an easy calculation to verify that P is obtained as RLRLRLR , therefore 
the PLR group is in fact generated by R and L. The isomorphism to the dihe¬ 
dral group of order 24 becomes apparent noting that the element RPLP is a 
translation up a semitone and therefore has order 12. We can visualize these oper¬ 
ations in the neo-Riemannian Tonnetz (see Fig. 4), a simplicial complex where 



Fig. 4. Reflections preserving a triangle’s edge in the Tonnetz represent the P, L and 
R operations 
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the vertices represent pitch classes and in which notes connected by a horizontal 
segment have intervals of a perfect fifth, while the other two directions represent 
major and minor thirds. Triangles sharing an edge represent triads that share 
two notes, while the third one differs only by a semitone or a whole tone. We 
can observe that reflections preserving a triangle’s edge represent P, L and R 
operations (and realize a parsimonious voice leading). 

3 Transformations Between Seventh Chords 

Childs investigated transformational parsimonious voice leading between domi¬ 
nant and half-diminished sevenths in [2] . In particular he studied transformations 
that fix two notes and move the other two notes by a semitone or a whole tone. 

Gollin also studied the relationships between the same types of sevenths 
chords [6] . He introduced a possible three-dimensional expansion of the Tonnetz 
in which horizontal planes contain copies of the traditional Tonnetz, while seg¬ 
ments in a chosen direction outside the plane represent intervals of minor seventh. 
While the Tonnetz tiles the plane with triangles, its three-dimensional expan¬ 
sion tiles the three-dimensional Euclidean space with tetrahedra, representing 
dominant and half-diminished seventh chords, and triangular prisms (not rep¬ 
resenting chords). There are six transformations between tetrahedra sharing a 
common edge: they are represented spatially as a “flip” of the two tetrahedra 
about their common edge. Each “edge-flip” maintains at least the two notes 
represented by the two vertices of the common edge, and in one case the two 
tetrahedra share three notes (see Fig. 5). 



Fig. 5. The six edge-flips between tetrahedra. In the upper right the only flip in which 
the tetrahedra represent seventh chords sharing three common notes 
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Arnett and Barth [1] start from the three-dimensional expansion of the 
Tonnetz introduced by Gollin and observe that Gollin’s study does not include 
the minor seventh chords, very common in the music literature. Therefore they 
propose to consider a set of 36 chords consisting of all dominant, half-diminished 
and minor seventh chords and to find the transformations between them that 
maintain three common notes. They define the following five operations: 


PI : D m 

PI 

P 2 : m <r+ hd 

PI 

Rl: D <r+ m — 3 

PI 

R‘2 : m <—> hd — 3 

R2 

L : D <—> hd T 4 

PI 


[x, x T 4, x T 7, x T 10] •<—>• 
[x, X + 3, X + 7, X + 10] 

[x, X + 4, X + 7, X + 10] 

[x, x T 3, x T 7, x T 10] <—> 
[x, x T 4, x T 7, x T 10] <—> 


[x, x T 3, x T 7, x T 10] 
[x,x + 3,x + 6,x + 10] 
[x,x + 4,x + 7, x + 9] 

\pC) x T 3, x T 7, x T 9] 

\x T 2, x T 4, x T 7, x T 10] 


The first four transformations move a single note by a semitone, whereas L 
shifts a note by a whole tone. In fact, L is the algebraic formalization of the 
edge-flip between tetrahedra representing seventh chords with three common 
notes described in Gollin’s three-dimensional Tonnetz. 

Although this study includes more types of seventh chords than Childs’ and 
Gollin’s ones, other important types of seventh chords are not considered and 
the algebraic structure of these transformations is not analyzed. 

Kerkez gives an idea to extend the PLR group to seventh chords in [7]. 

Let H be the set of major and minor seventh chords, that is, 


H = {[x l,X2,X 3 ,X 4 ]|xi,X2,X3,X4 G Z 12 ,X 2 =X 1 + 4, X 3 = X\ + 7, X 4 = X\ + 11} U 
{[xi ,x 2 ,x 3 ,x 4 ] I xi,x 2 ,x 3 ,x 4 G Zi 2 ,x 2 = xi + 3, £3 = xi + 7,x 4 = x\ + 10} 


Kerkez defines the following two maps P, S : H —> H: 


P[a, 6, c, d\ = [( type[a , b, c, d]) • 2 + d, a, 6, c] 

S'[a, b, c, d] = [6, c, d, (—1) • (type [a, b, c, d]) • 2 + a] 


where 


type(t) = 




if t is a minor seventh 
if t is a major seventh 


P maps each major seventh to its relative minor seventh moving the seventh 
down a whole tone. Vice versa, it maps each minor seventh to its relative major 
seventh moving the root up a whole tone. 

S maps each major seventh to the minor seventh having root 4 semitones up, 
moving its root up a whole tone. Vice versa, it maps each minor seventh to the 
major seventh having root 4 semitones down, moving its seventh down a whole 
tone. 

Kerkez proves that transformations P and S act on H generating a group 
again isomorphic to the dihedral group D \2 of order 24. 

In his work, Kerkez considers only major and minor seventh chord. But, as 
he noted in his conclusions, these transformations are just two of the possible 
operations between seventh chords. 



18 


S. Cannas et al. 


3.1 Transformations Between Seventh Chords 

We want to find all transformations between seventh chords describing parsimo¬ 
nious voice leading, i.e. those that fix three notes and move only one note by 
a semitone or a whole tone. We consider the following types of seventh chords: 
dominant (D), minor (m), half-diminished (hd), major (M) and diminished (d), 
and let H be the set of all seventh chords of these 5 types. We first analyze 
transformations moving just one note by one semitone: if it exists, let us call 
Qi+ the map that sends each type of seventh chord to another type moving 
the i-th member up a semitone, where i = R,T,F, S depending on whether the 
member is considered to be the root (R), the third (T), fifth (F) or seventh (S), 
respectively. Likewise, let Qi- be the map that moves the i-th member down a 
semitone. We have the following: 


Qr+(D) = d 

QR+(m) = D 

Q R+ (hd) = m 

Qr+{M) = hd 

QR+{d ) = hd 


QptRrR 

Qn-(hd) = M 


Q R -(d)=D 


Qr+im) = D 

QjpRhxR 


Qx+id) = hd 

Qt-(D) = m 


QpRhxT) 


Qr-(d) = D 


Qp-fRn) 

Qp+{hd) = m 

si*rrr 

Qp+(d) = hd 

SlpRP) 

QF-{m) = hd 



Q F -{d) = D 

Qs+(D) = M 

QsRrnJ 

QsRRR 

s>*r*r 

Qs+(d) = hd 

Qs-{D) = m 

Qs-{m ) = hd 

Qs-(hd) = d 

Qs-(M) = D 

Qs-(d)=D 


The maps that do not produce any of the classical types of seventh chords have 
been overstruck. We observe that some transformations are inverse to each other: 


Q R+ (M) = hd 
QR+(m) = D 
Q R+ (hd) = m 
Qs+(D ) = M 

Qr+ijn) = D 
Qp+(hd) = m 

It remains to consider 


Qn-(hd) = M 
Qs-{D ) = m 
Qs-(m ) = hd 
Qs-(M) = D 
Qt-(D) = m 
Qf-( rn) = hd 

following operations 


=t > Q r • TT > hd 

= t > Q /? • Q s • -h) m 

=> Qr,Qs- hd ^ m 

=> Qs : D <-> M 

=> Qt : m D 

=> Qf - hd <r-> m 


Qr+{D) = d Q R -(d)=D Q T _(d) = D Q F -(d) = D Q S -(d) = D 
Qs-(hd) = d Qs+(d) = hd Q R +(d) = hd Qt+(gT) = hd Qf+(^) = sd 

Q R +(D) = d is the inverse of QR-(d) = D,Qr-(d) = D,Qp-(d) = D and 
Qs-(d ) = D. This is due to the particular symmetry of the interval structure of 
diminished sevenths, in which the members of the chord play an identical role: for 
example the diminished seventh C jj° 7 = [C#, E 1 , G, Bb] acoustically coincides to 
the diminished seventh E %° 7 = [E, G, B\), DV\ because they are enharmonically 
equivalent. Unlike the other four types, the diminished sevenths would be only 
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three (and not twelve), e.g. C, C%, D , because the other nine chords are three by 
three enharmonic to them. This explains why we have four transformations that 
have the same inverse. To obtain a set of well-defined musical transformations, 
we will consider the diminished seventh as 12 distinct chords, using the marked 
root to distinguish them. Hence we have 4 transformations between diminished 
and half-diminished seventh chords and 4 transformations between diminished 
and dominant seventh chords 


Qs-(hd) = d 

Q R -(d) = hd 

Qs-(hd) = d 

Qt (d) = hd 

Qs-(hd ) = d 

Q f— (d) = hd 

Qs-(hd) = d 

Qs-(d) = hd 

Qr+(D) = d 

Q R -(d) = D 

Qr+(D) = d 

Q T -(d) = D 

Qr+(D) = d 

Q F -(d) = D 

Qr+(D) = d 

Qs-(d) = D 


=> Qr , Qs : hd <-» d 
=> Qt, Qs '• hd ^ d 
=> Qf-, Qs : hd d 

=x- Qg i hd <—> d 

=t > Qr • D «—» d 
=> Qr , Qt '• D <-> d 

=> Qr , Qf : D d 
=> Qs' D ^ d 


Now we consider the transformations that move a single note by a whole 
tone. Analogously to what was done above, if they exist let us call Qi++ the 
map which sends each type of seventh chord in another type moving the i- th 
member up a whole tone, and Qi- the map which moves the i-th member down 
a whole tone. We obtain another classical type of seventh chords only moving 
the root up a whole tone and the seventh down a whole tone: 

QR++(D)=hd Q R++ (m) = M RssfFFF Q R++ (M) =m 
Q s —( m) = M Q s — (hd) = D Q s —(. M)=m 

Again, we find some transformations that are the inverse one another: 

Qr++(D) = hd Qs — (hd) = D => QR,Qs'-D^hd 

Q jR++ (m) = M Q s — (M) = m => Qr,Qs- m ^ M 

Qi?++(M) = m Qs — (m) = M => Qs ■ M ^ m 

Overall we have 17 transformations corresponding to a parsimonious voice lead¬ 
ing among our 5 types of seventh chords. 

We want to define these transformations in a similar way to the neo- 
Riemannian operations. We will use the Arnett and Barth’s notation, but we 
want to formalize it more precisely. 

Definition 1. We define a cyclic marked chord [xi, £ 2 , • • •, x n \ as a chord 
constituted by the n musical notes x\, £2, • • •, % n , so that acoustically 
\xi,x 2 , ...,a; n ] = [x 2 , ...,x n ,xi\ = • • • = [x n ,xi ,.. .,x 2 ], where e Z i2 and 
the note corresponding to the root of the chord is underlined. 

As above, in cyclic marked chords all notes will be expressed in terms of a single 
note by adding or subtracting the appropriate number of semitones. 
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We start defining a parallel operation P for seventh chords. Let Pij : H —> H 
be the maps which send a i-th type of seventh chord to a j-th type of seventh 
chord, 1 < i,j < 5 and i ^ j , and vice versa, and that fix the other types. 4 of 
the 17 transformations are parallel operations: 


Qt : D m 


P12: 

[x, x + 4, x + 7, x + 10] 

[x, x + 3, x + 7, x + 10] 

Qs : D <-> M 


Pw: 

[x, x T 4, x T 7, x T 10] > 

[x, x + 4, x + 7, x + 11] 

Qi? : m ^ hd 


-P23 ; 

[x, x T 3, x T 7, x T 10] > 

[x, x + 3, x + 6, x + 10] 

Qs : hd d 


P35: 

[x, x + 3, x + 6, x + 10] 

[x, x + 3, x + 6, x + 9] 


Remark 1. P \2 and P 23 coincide with PI and P 2 defined by Arnett and Barth. 

Now we consider a relative operation R. We observe that if the triad is major 
R = P o T _ 3 = T _ 3 o P, if it is minor R = P o P 3 = P 3 o P. Then let P^ : H —> H 
be the maps which send a i-th type of seventh chord to a j-th type of seventh 
chord transposed three semitones down, a j-th type of seventh to a i-th type of 
seventh transposed three semitones up, and fix the other types. Then: 

Rij = T± 3 o Pij = Pij °X±3 Vi, j G {1, 2, 3,4, 5} (7) 

Now, 5 of the 17 transformations are relative operations: 


QR,Qs‘D^m- 3 O 
Qr, Qs • hd — 3 
Q /? • Q 5 : M m — 3 
Q#, Qs : hd d — 3 <t=> 
QF,Qs'-d^hd — 3 O 


^12 

[x, 

X 

+ 4, 

X 

+ 7,x + 

10] 

P 23 

k, 

X 

+ 3, 

X 

+ 7, x + 

10] 

So 

to 

[x, 

X 

+ 4, 

X 

+ 7, a; + 

11] 

R35 

[x, 

X 

+ 3, 

X 

+ 6,x + 

10] 

R53 

[x, 

X 

+ 3, 

X 

+ 6,x + 

9]< 


[x, x + 4, x + 7, x + 9] 

> [x, x T 3, x T 7, x T 9] 

[x, x T 4, x T 7, x T 9] 

<-> [x, x + 3, x + 6, x + 9] 

-> [x, x + 3, x + 7, x + 9] 


Remark 2. P 12 and P 23 coincide with PI and P 2 defined by Arnett and Barth. 
Moreover P 42 coincide with the map P defined by Kerkez. 

For the operation L we observe that if the triad is major L = P o T 4 = T 4 o P, 
if it is minor L = P o T _ 4 = T _ 4 o P. Then let : H —> P be the maps which 
send a i-th type of seventh chord to a j-th type of seventh chord transposed four 
semitones up, a j-th type of seventh to a i-th type of seventh transposed four 
semitones down, and fix the other types. Then: 

Rij = r±4° Pij = Pij ° T±4 Vi, j G {1, 2, 3, 4, 5} (8) 

This time, 3 of the 17 transformations are operation: 

Qr++ \ D <r+ hd-\- 4 O L 13 : [x, x + 4, x + 7, x + 10] <-> [x + 2, x + 4 , x + 7, x + 10] 

Qr : D <r+ d + 4 L15 : [x, x + 4, x + 7, x + 10] <-> [x + 1, x + 4 , x + 7, x + 10] 

Qs++ : M m + 4 L 42 : [x, x + 4, x + 7, x + 11] <-> [x + 2, x + 4 , x + 7, x + 11] 
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Remark 3. L 13 coincides with L defined by Arnett and Barth end the “edge-flip” 
described by Gollin in his three-dimensional Tonnetz. 

L 42 coincides with S defined by Kerkez. 

We have identified 12 of the 17 transformations between seventh chords as oper¬ 
ations similar to P, L and R. We now see that the other transformations corre¬ 
spond to new operations obtained by the composition of a parallel transformation 
and a transposition (with a number of semitones different from 3 and 4). 

We denote by: 

- Qij the maps which send a i-th type of seventh chord to a j- th type of seventh 
chord transposed one semitone up, a j- th type of seventh to a i-th type of 
seventh transposed one semitone down, and fix the other types; 

- RRij the maps which send a i-th type of seventh chord to a j- th type of 
seventh chord transposed six semitones, and fix the other types; 

- QQij the maps which send a i-th type of seventh chord to a j- th type of 
seventh chord transposed two semitones up, a j- th type of seventh to a Pth 
type of seventh transposed two semitones down, and fix the other types; 

- Nij the maps which send a Pth type of seventh chord to a j- th type of seventh 
chord transposed five semitones up, a j- th type of seventh to a Pth type of 
seventh transposed five semitones down, and fix the other types. 

With these transformations we can define the missing operations in the following 
way: 


Qr , Qs - M <—> hd + 1 
Qr : D <—>■ d + 1 
Qt 1 Qs- hd <r+ d - 6 
Qr-, Qt '- d ^ D -\- 2 
Qr-, Qf : d <—>■ D + 5 


Q 43 : [x, x + 4, x + 7, x + 11] [x + 1, x + 4, x + 7, x + 11] 

Q 15 : \x, x + 4, x + 7, x + 10] <-> \x + 1 , x + 4, x + 7, x + 10] 

^ RR 35 : \x_-, icT3,icT6,icT 10] [cc, T 3, x T 6,^c T 9] 
QQbi ’- \x_-> x T 3, x T 6, x T 9] [cc, x T 2, x T 6, x T 9] 

<^> IV 51 : [x, x + 3, x + 6, x + 9] <-> [x, x + 3, x + 5 , x + 9] 


Remark 4- Crans, Fiore and Satyendra define P, L and R as inversions J n ; since 
inversions are isometries, they leave unchanged lengths and angles, and minor 
and major triads geometrically are represented by triangles which the edge 
lengths correspond to 3,4 and 5 semitones. This idea could in principle also 
be used to define transformations between seventh chords, but it can not be 
applied to all types since the lengths of the edges and the angles of the quadri¬ 
laterals that compose them are not equal. We have only 2 quadrilaterals that 
are isometric: the one representing the dominant sevenths and the one repre¬ 
senting half-diminished sevenths. There exists a unique transformation between 
this types of seventh chords, L 13 . 

To visualize the 17 transformations just defined we can represent them in a 
graph whose vertices represent the types of seventh chord, and the edges rep¬ 
resent the transformations between them. Therefore we have 5 vertices and 17 
edges Fig. 6 . 
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Fig. 6. The graph representing the 17 transformations between seventh chords. 


4 The PLRQ Group 


Let PLRQ be the group generated by the 17 transformations among seventh 
chords. Each transformation t G PLRQ exchanges two types of sevenths and 
fixes the others, thus we can associate to it a permutation of S$ (more precisely, 
a transposition). This information is not sufficient to identify the transforma¬ 
tion: to identify it, we add a vector v e zf 2 , in which the i-th component, 
i £ {1,..., 5}, is the number of semitones of which the root of the chord of type 
i has to be shifted to become the root of the chord of type j. It is easy to see 
that in this way no ambiguity is possible. 

We write the 17 transformations between seventh chords as pairs of elements 
(cr, v) G S$ x Zf 2 explicitly: 


Pi 2 : [x, x + 4, x + 7, x + 10] [x, x + 3, x + 7, x + 10] 

Pi 4 : [x, x + 4, x + 7, x + 10] <-> [x, x + 4, x + 7, x + 11] 
P23 : \x, x + 3, x + 7, x + 10] <-> \x, x + 3, x + 6, x + 10] 
P35 : [x, x + 3, x + 6, x + 10] <-> [2;, 2; + 3, x + 6,2; + 9] 
P12 : [x, x + 4,2? + 7, x + 10] <-> [x, x + 4, x + 7, x + 9] 
P23 : [x, x + 3, x + 7, x + 10] <-> [x, x + 3, x + 7, x + 9] 
P 42 : [x, x + 4, x + 7, x + 11] «-> [x, x + 4, x + 7, x + 9] 
P 35 : [x, x + 3, x + 6, x + 10] <-> [x, x + 3, x + 6, x + 9] 
P 53 : [x, x + 3, x + 6, x + 9] <-> [x, x + 3, x + 7, x + 9] 


(a,x) = ((12), (0,0, 0,0,0)) 
(<r,v) = ((14), (0,0, 0,0,0)) 
(<r,v) = ((23), (0,0, 0,0,0)) 
(a,v) = ((35), (0,0,0,0,0)) 
(a,x) = ((12),(-3,3,0,0,0)) 
(<r,v) = (( 23), (0,-3, 3,0,0)) 
M = ((42), (0,3,0, -3,0)) 
( a , v ) — ((35), (0,0, —3,0,3)) 
( a i v ) — ((53), (0,0,3,0, —3)) 
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L 13 : [x, x + 4, x + 7, x + 10] [x + 2, a; + 4 , x + 7, x + 10] (a, v) = ((13), (4, 0, —4, 0, 0)) 

L 15 : [x, x + 4, x + 7, x + 10] [x + 1, a: + 4 , a; + 7, a; + 10] (cr, v) = ((15), (4, 0, 0, 0, —4)) 

L 42 : \x, x + 4, x + 7, x + 11] <r+ [x + 2, x + 4 , a; + 7, a; + 11] (a, v) = ((42), (0, —4, 0, 4, 0)) 

Q 43 : [x, x + 4, x + 7, a: + 11] [a; + 1 , a; + 4, a; + 7, a; + 11] (cr, v) = ((43), (0, 0, —1,1, 0)) 

Q 15 : [#, a; + 4, a: + 7, x + 10] [a; + 1 , x + 4, a; + 7, a; + 10] (cr, v) = ((15), (1, 0, 0, 0, — 1)) 

RR35 : [x, x + 3, x + 6, x + 10] <-+ [x, x + 3, x + 6 , a; + 9] (cr, v) = ((35), (0, 0, —6, 0, 6)) 
QQ 51 • \x, x + 3, x + 6, x + 9] <r-> [x, x + 2 , x + 6, x + 9] (a, v) = ((51), (—2, 0, 0, 0, 2)) 

A/ 51 : [a?, x + 3, x + 6 , x + 9] <-+ [a;, x + 3, x + 5 , a: + 9] (cr, v) = ((51), (—5, 0, 0, 0, 5)) 


More precisely, we can represent each transformation t E PQRL as an element of 

5 

S$ x Z where Z = {v E Zf 2 | = 0}? 

i=l 

since this is clearly true for all the 17 generators. The mapping thus defined 
becomes a group homomorphism if we define on this set the following operation: 

(cr k ,V k ) o • • • o (<Ti,1>i) 

= (o-fc • • - cri,Vi + + (0-20 -i) - 1 (^3) H-h (C7 fe _i • • •o-i) _1 (t> fe )) (9) 

= (o-fe • • - cr 1,«1 + +crf 1 cr^ 1 ( 1, 3) H-f cf 1 • • • <7*4 x (v k )) 

We want to prove that PLRQ is isomorphic to 65 x Z. We remind the definition 
of semidirect product of two subgroups. 

Let G be a group. If G contains two subgroups H and K such that 

(i) G = HK- 

(ii) K<G ; 

(hi) HHK = 1; 

G is the semidirect product of H and K. Conversely, given two groups H and 
K and a group homomorphism <f\ H —► Aut(K ), we can construct a new group 
H x K defining in the cartesian product H x K the following operation: 

(hi,k 1 )(h 2 ,k 2 ) = (hih 2 ,4> h2 (ki) ■ k 2 ) 


Theorem 1. The group PLRQ is isomorphic to S$ k Zf 2 - 

Proof. First of all we prove that PLRQ is isomorphic to S$ x Z. 

We observe that the subgroup formed by the elements ( Id,v ) is normal. In 
fact, for all (cr,v) E S 5 x Z, ( Id,v') E {Id} x Z, we have 

(cr, v)(Id , v')(a , v ) _1 = (crcr -1 , — v + cr(i/) + cr(v)) = ( Id , t’ // ) E {/d} x Z 

On the other hand, since S$ is generated by transpositions, it is easy to see 
that, calling O the origin in Zf 2 , S 5 x {0} < PLRQ , since we already have in 
it (P 12 ,0),(Pi4,0),(P 2 3,0),(P35,0). 
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To prove our thesis we are only left to see that there is a subgroup isomorphic 
to Z in PLRQ having trivial intersection with S$ x {O}. But this is exactly the 
subgroup of the elements of type ( Id,v ). In fact we compute the permutations 
and vectors associated to P 42 P 42 , P 14 P 42 P 14 P 125 P 12 P 13 P 12 P 23 • 

i?42^42 = (tf+P) 

cr' =<J 2 (t 1 = (42) (42) = Id 

v' =v 1 + erf 1 (n 2 ) 

=(0,-4,0,4,0)+ (0,-3,0,3,0) 

=(0,-7,0,7,0) 

P14P42P14P12 = ( g ", v ") 

<j" =(74(73(720-1 = (14)(42)(14)(12) = Id 

v" =Ui + crf : 1 (v 2 ) + erfVf 1 ^) + erf VfVf 1 ^) 

=(—3,3,0,0,0) + (0,0,0,0,0) + (-4,4,0,0,0) + (0,0,0,0,0) 
=(7,-7,0,0,0) 

P 12 P 13 P 12 P 23 = (P'V") 

(7 /// =(74(73(72(71 = (12) (13) (12) (23) = Id 

v'" =vx + (7f 1 (n 2 ) + erf 1 (7 2 “ 1 (n 3 ) + erf 1 af 1 erf 1 (n 4 ) 

=(0, -3,3,0,0) + (0,0,0,0,0) + (0, -4,4,0,0) + (0,0,0,0,0) 
=(0,7,-7,0,0) 

With the following elements just computed 

R^L^m(Id, (0,-7,0,7,0)) (10) 

P 14 L 42 P 14 P 12 = (Id, (7, —7,0,0,0)) 

P 12 P 13 P 12 P 23 = (M (0, 7, -7,0,0)) 

we can generate each element (Id, (vi, n 2 , ^ 3 , 0)), with (ui, n 2 , ^ 3 , 0) G Zf 2 

such that v i = 0. To see this, taken a,b,c E Z, we have to solve 

a(0, —7,0, 7, 0) + b(7, —7,0,0, 0) + c(0, 7, —7,0, 0) = (vi, v 2 , v 3 , v 4 , 0) (mod 12) 
(—7b, —7a + 7 b — 7c, 7c, 7a) = (v\, V 2 , v 3 , V 4 , 0) (mod 12) 


£ 

III 

b- 

1 

III 

-O 

b- 

1 

—7a + 7b — 7c = V 2 

1 7b — 7c = V 2 + 7a 1 

)7c = v 3 => 

1 7c = -v 3 ^ \ 

III 

III 

b- 


7b = —ni 

-ui - v 3 = n 2 + V 4 
7c = -v 3 
7 a = 


which is solvable because 7 is coprime with 12. 

To obtain all elements (Id,(v 1, n 2 , U3, U4, U5)), with (ni, n 2 , U3, U5) G Zf 2 

such that J2i v i = P if is sufficient add to the 3 generators listed in 10 the 
generator P1235P23P12P15P13 = (Id, (7,0, 0, 0, -7)). 

But it is evident that Z cx Zf 2 , hence PLRQ ~ S 3 x Zf 2 . □ 
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Abstract. One of the most significant attitudinal shifts in the history 
of music occurred in the Renaissance, when an emerging triadic con¬ 
sciousness moved musicians towards a new scalar formation that placed 
major thirds on a par with perfect fifths. In this paper we revisit the 
confrontation between the two idealized scalar and modal conceptions, 
that of the ancient and medieval world and that of the early modern 
world, associated especially with Zarlino. We do this at an abstract level, 
in the language of algebraic combinatorics on words. In scale theory 
the juxtaposition is between well-formed and pairwise well-formed scales 
and modes, expressed in terms of Christoffel words or standard words 
and their conjugates, and the special Sturmian morphisms that generate 
them. Pairwise well-formed scales are encoded by words over a three- 
letter alphabet, and in our generalization we introduce special positive 
automorphisms of F 3, the free group over three letters. 


Keywords: Pairwise we 11- formed scales and modes 
Well-formed scales and modes • Well-formed words • Christoffel words 
Standard words • Central words • Algebraic combinatorics on words 
Special Sturmian morphisms 


1 Introduction: Authentic and Triadic Modes 

Figure 1 shows a C-major scale with two different interpretations of its step 
interval pattern. In the annotation aaba\aab (above the staff) the two letters a 
and b designate the major and minor steps, respectively. The vertical stroke | 
designates the authentic divider of the mode into a species of the fifth aaba and 
a species of a fourth aab. This pattern is called the Authentic division of the 
Ionian Mode. In the annotation ac\ba\\cab (below the staff) the three letters a, c 
and b designate the greater and lesser major and the minor steps, respectively. 
Together they divide the major mode triadically into a species of the major third 
ac, a species of the minor third ba and a species of the fourth cab. This pattern 
shall be called the Triadic Division of the Ionian Major Mode. 
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a c I b ale a b 

Fig. 1. Authenic division of the Ionian and triadic division of the Ionian major mode. 


The contrasting juxtaposition evokes several open questions of historical and 
systematic nature about the particular relevance of these modes for different 
types of music and music analysis. Within the discourse of mathematical music 
theory they point to the theory of well-formed scales and modes [2, 5,6,10] , on the 
one hand, and to the theory of pairwise well-formed scales [3,4], on the other. In 
the present article we therefore extend some transformational innovations within 
the theory of well-formed modes in order to make them fruitful within a theory 
of pairwise well-formed modes. These investigations eventually contribute to a 
deeper theoretical understanding of the juxtaposition in Fig. 1. 

2 Non-singular Pairwise Well-Formed Modes 

In this section we revisit some important results from [4] about the structure 
of non-singular pairwise well-formed scales and re-interpret them in a word- 
theoretic context. The 3-letter word acbacab describes the species of the octave 
of the Ionian Major mode, and thereby it is the step interval pattern of a pair¬ 
wise well-formed scale. The motivation behind this concept is the following: 
The two-letter word aabaaab , describing the Ionian species of the octave can 
be obtained from acbacab by an identification of the letter c with the letter a: 
tt c ^ a (acbacab) = aabaaab. In traditional music-theoretical terms this letter pro¬ 
jection describes syntonic identification , i.e., neglecting the difference between 
the greater and lesser major steps. There are two more such letter identifi¬ 
cations, both of which lead to well-formed modes, whence the term pairwise 
well-formed. One of them is n^^bacabac) = cacacac. It describes an iden¬ 
tification of the minor step with the lesser major step, i.e., neglecting their 
difference, what we may call the (harmonic) apotome. This mode neutralizes 
also the difference between the major and minor thirds and can be seen as a 
modal refinement of the generic third-generated scale. The third letter projec¬ 
tion 7 t a ^i)(bacabac) = bbcbbbc identifies the minor step with the greater major 
step. The neglected interval is the sum of the two previously mentioned ones, 
and so we can formally speak of an apo-syntonic identification. It is arguable, 
though, whether this third projection bears a direct musical meaning. As an 
auxiliary construction it proves to be very useful on a theoretical level. This 
becomes clear in the course of the article. 

We will represent non-singular pairwise well-formed (PWWF) scales by words 
over a three-letter alphabet A = {a, 6, c}, as in [3], and we will henceforth refer 
to PWWF words and drop the qualifier “non-singular” (the singular case is repre¬ 
sented by the word abacaba and its word-theoretical conjugates). We denote the 
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set of all PWWF words by 21 C {a, 6, c}*. From [2,4], v G 21 if and only if each of 
the three projections n x ^ y (described above) results in a well-formed word, i.e., 
the conjugate of a Christoffel word, equivalently, of a standard word (see [1,7] for 
the relevant word theory background). For every word v G 21, the length \v\ is odd, 
and the multiplicities of two of the letters are the same: \v\b = \v\ c . It follows that 
\v\ a is odd. In light of these facts, given a special standard word, we can always 
construct a PWWF word, by the bisecting substitution defined below. 

Definition 2 . 1 . Consider a special standard morphism f acting on the word 
monoid {a, c}* and consider the word w = f(ac ) = W1W2 ... w n . We further 
suppose that \w\ = n is odd and that \w\ c is even. Then we define the bisection 
of the word w as = v = V\V 2 ... v n G {a, 6, c}* with 

{ a ifw k = a, 

b ifwk = c and \w \.. .Wk\ c is odd 
c ifwk = c and \wi.. .Wk\ c is even. 

The bisecting substitution a : {a, 6, c}* —» {a, 6, c}* is then defined as 

cr(a) = Vi.. .Vm, <r(b) =v m +i...V 2 m, <T(c)=tV2m+l>--v n , where mrn\f(a)\. 


Remark: PWWF scales have distinct inversions, whereas the inversion of a mode 
of a well-formed scale is a mode of that scale (e.g., Ionian inverted is Phrygian). 
In word-theoretical language, if re is a standard word, the reversal of w is in the 
conjugacy class of w. For w G 21, the reversal of w is in its own conjugacy class, 
distinct from that of w. For k G N, there are </>(&)/2 distinct conjugacy classes 
of standard words of length fc; for PWWF words (k odd), there are cf)(k) distinct 
conjugacy classes [4]. The defining projections are insensitive to reversal, however, 
up to trivial replacements of the letters, so we may pair w with its reversal, and 
choose whichever is convenient as representive. There is thus a bijection between 
conjugacy classes of standard words and classes of PWWF words of odd length k. 

For example, w = bacabac G 21, and its reversal is w' = cabacab. tt c ^ a (w) = 
baaabaa (representing Phrygian), i Tb^ a (w) = aacaaac (representing Ionian), 
and 7 Tb^ c (w) = cacacac (representing, e.g., Dorian thirds); while i T c ^ a (w') = 
aabaaab (Ionian), 7r b^ a (w') = caaacaa (Phrygian), and i Tb^ c (w) = cacacac. 
Therefore, we may depart from either PWWF representative. (A future investi¬ 
gation will be dedicated to the musical interpretation of the distinction between 
the representatives). 


Proposition 2.2. Consider a PWWF substitution a and let f(a) = 
7Tb_> c (cr(a)),/(c) = 7 T^ c (a(bc)) denote its apotomic projection. Let further 

Mf = ( \ f r [ a \\ a Kl C |! a j denote the incidence matrix of f. Then the incidence 
j vi/wic \m\cj 
matrix of a is given as 


Ma 


l/(«)lo \f(a)\a |/(c)|„ - |/(a)|„\ 

l J/( a )lc j |- I/(q)Ic -| l/(c)lc - i/(q)ic 

rl/(«)|c-| | |/(a)|c | |/(c)|c - \f{a)\c 

-2-/ 
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3 Apo-syntonic Conversion 

In addition to the letter projections i r^ c , tt a -> b , on words v £ A*, we apply 

them to substitutions as follows (using the same symbols): 

Definition 3.1. Consider a substitution (a monoid morphism) a : A* — » A*. 
Then we obtain three induced substitutions i r^ c (cr) : {a, c}* — > {a, c}* ; tt a ^ b (a) : 
{6, c}* —> {6, c}* and TT c ^ a (cr) : {a, 6}* —> {a, 6}* by virtue of: 

[7T b ^ c (a)](a) := 7T b ^ c (a(a)) [TT b ^ c (a)](c) := t r b ^ c (a(bc)) 

[7T a ^ b (a)](b) := 7T a ^ b (a(ab)) [7T a ^ b (a)](c) := TT a ^ b (a(c)) 
[7r c _ a (a)](a) := 7r c _» a (cr(a&)) [7r c — a (<r)](&) := 7r c ^ a (cr(c)) 

We sa?/ that a is an authentic PWWF substitution iff all three projections 
7T b ^ c (a), 7T a ^ b (a) and tt c ^ a (a) are Special Sturmian morphisms. 

In the rest of this section we will assume that a is the bisecting substitution 
associated with a suitable special standard morphism / and hence / = t r b ^ c (a). 
Further we use the symbols g and g for the projections g = TT a ^ b (a) and g = 
TT c ^ a (a) The diagram in Fig. 2 shows the interplay of a with its three projections. 


a 



Fig. 2. Interplay of a PWWF substitution a with its projections /, g and g. 


Our goal is now to understand the interdependence between / and g. 


Proposition 3.2. Consider an authentic (non-singular) PWWF mode a G 21 
with the projections f = tt b ^ c (a),g = Tr a ^ b (cr) and g = 7r c ^ a (cr). Then the 
common incidence matrix M g = M g of g and g can be expressed in terms of the 

coefficients of the incidence matrix Mf = ' 


l/MIc \f(c)\ c 


as follows: 


M - M - + \f( a )\c l/( c )U - |/(a)|a + !(|/(c)|c- |/(a)|c) 

9 ~ 9 ~ \ l/Wlc I(|/( c )|c — \f{°)\ c b) 


Proof. The incidence matrix M g can be obtained from adding the first two 
columns and the first two rows of M a . Thus, after Proposition 2.2 the upper left 


entry of M g becomes \f(a)\ a + \f(a)\ a + |_ 


l/Wlc , , r \f(a)\ 


1 + 0 


= 2|/(a)| a + /(a)| c . 


2 


2 
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The upper right entry becomes |/(c)| a — |J(a)| Q + l^( a )l c ^ ^he i ower j e ft 

entry becomes j[= |/(a)| c . The lower right entry remains 

m\c-\f(a)\c 


In order to understand the connection between / and g more directly, rather 
than via the substitution cr, we have a closer look at the structure of the linear 
map: j3 : GL 2 (WL) —> GL 2 (R) with 


h 


a 11 a \2 
&21 &22 


):= 


2an + Ui 2 — an + (a 22 — a 2 i)/2 

a2i (a22 — CL2i)/2 


With ^ let A 2 and A 2 denote the following two subsets of SL 2 (N): 

I 



26n 612 
\ 621 ^22 
&11 &12 
621 2622 


| ^ll 5 &12? &21> ^22 £ N, 26 h622 “ ^12^21 — 1 
' ^ | foil, &12? &21> &22 £ N, 26 h622 — 612621 = 1 


Lemma 3.3. SX 2 (N) fl ^ _1 (5'L 2 (N)) = X 2 and (3(X 2 ) = X 2 . 


Proof. Consider an arbitrary matrix X 


Cll C\2 

C 21 c 2 2 


G 5L 2 (N). Then we have 


Cn Cl2 
\ c 21 C22 


CH—C21 CH—C21 I r „ 

\ _ / - 2 - -2-' ° 12 ~ ° 22 

' C21 C 21 + 2C22 


The entry C 21 + 2c22 is larger than C 21 and therefore (3 1 (X) G SL 2 (N) iff 

Y = fi-HX) • R- 1 = ( 2 C12 -C22 \ € SL r N \ But in order to have a 

V C 2 1 ZC 2 2 J 

positive entry C 12 — C 22 it turns out that X G SL 2 { N) cannot be arbitrary. Also 

Z = R- 1 • X = ^ Cn c C 21 Cl2 c ° 22 ^j must be in SL 2 ( N). And this implies 

cn > C21. Finally, we have to deal with the condition that the upper left entry 
Cll ~ C21 of / 3~ 1 (X) needs to be an integer. This implies that c\\ — C 21 , which is 
also the upper left entry of Z, is even, i.e., RZ = X G X 2 . 

Corollary 3.4. The set X 2 parametrizes the conjugation classes of all authentic 
PWWF substitutions a in terms of the incidence matrices Mf of their associated 
apotomic projections: the special standard morphisms f = 7r^ c (cr). Also the 
set X 2 parametrizes these same conjugation classes by virtue of the incidence 
matrices M g of their associated apo-syntonic projections g = tt a ^(cr). 
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This motivates the following definition: 

Definition 3.5. The restriction of (3 to the subset £2 is called the apo-syntonic 
conversion; 

P : £2 -> V 2 . 

Let 5 : SL 2 (N) —> 51/2 (N) denote the main-diagonal-flip 

xrfbn &12V _ fb 22 bi2 \ 

\ b 2 1 622/''“ V&21 bn >/ “ 

Lemma 3.6. S mutually exchanges £ 2 and £ 2 , ^.e. S(£ 2 ) = £ 2 and5(£ 2 ) = £ 2 . 
Proposition 3.7. We have the following commutative diagram 


£2 


P 


£ 2 


5 


5 


a 2 


P 


£2 


Proposition 3.8. Under the convention that the apotomic projection f is a 
standard morphism, the apo-syntonic projection g also turns out to be a special 
standard morphism. 


Proof As / is special standard we have a decomposition w = f(ac ) = f(a) /(c) = 
tcasac with a negative (=plagal) standard word tea and a positive (=authentic) 
standard word sac. We have g(bc) = U 1 U 2 ... u\ w \ G {6, c}* with 


{ b if Wk = a, 

b if Wk = c and |rei .. .Wk\ c is odd 
c if Wk = c and |rei .. .Wk\ c is even. 

The final letter of u is c, because \w\ c is even, so by definition of u above, the last 
letter c is fixed. Let v = cue -1 G { 6 , c}* denote the result of conjugating u with 
c~ l . In order to show that g is a special standard morphism, it is sufficient to 
show that v = V\V 2 ... v\ w \ is the bad conjugate of u. Let m = \g(b)\ denote the 
length of g(b). It is sufficient to show that \vi ... differs from \m ... = 

\g(b)\b- Knowing that /(a) is a prefix of /(c) we write w = /(a)/(a)/(a _ 1 c) = 
tcatcarac and we may conclude that w rn = a, and hence u rn = b = v m +i. But 
in the light of v\ = c this implies \v% .. .v m \5 = \g(b)\b — 1 , i.e. v is the bad 
conjugate. 


4 PWWF Substitutions and Automorphisms of F 3 

In the two-dimensional situation of well-formed modes we may interpret the Stur- 
mian morphisms as positive automorphisms of the free group F 2 . In the world of 
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substitutions on words in three letters their analogues constitute different trans¬ 
formational concepts. The PWWF substitutions are well-adopted to the family 
of (non-singular) pairwise well-formed modes. Still there is a small subfamily of 
pairwise well-formed modes, where these substitutions are also automorphisms 
of the free group F 3 . This final section is dedicated to their study. To get a con¬ 
crete idea about the role of FVautomorphisms, we look into the Authentic and 
Triadic Divisions of the Phrygian mode (see Fig. 3). In the theory of well-formed 
modes one describes the species baaa and baa of the fifth and fourth as images 
of a and b under a word transformation g(a) = baaa , g(b) = baa. The left side 
of Fig. 3 shows a decomposition of this compound transformation into three ele¬ 
mentary ones. First, the fifth a is filled with a fourth and a major step a ^ ba , 
then both fourths are filled with a minor third and a major step b ba and 
finally, both minor thirds are filled with a minor step and a major step b i—► ba. 




Fig. 3. Construction of the Authentic Phrygian and the Triadic Phrygian Minor modes 
through substitutions. 


The right side of Fig. 3 shows an analogous procedure for the construction of 
the Phrygian Minor mode ba\ca\\bac. The fourth is filled with a minor third and 
a lesser major step: c i—► be. Then the major third is filled with a lesser and and 
greater major step: a i—► ca and finally both minor thirds are filled with a minor 
step followed by a greater major step: b ba. This final act in the generation 
of the Phrygian Minor mode does not work analogously for the Ionian Major 
mode, because there we find two different species of the minor third: ba and ab. 

The automorphism group Aut(F n ) of the free group F n = (xi, # 2 , • • •, x n ) 
is redundantly generated by the elementary Nielsen transformations (see [8,9], 
p. 162 ff.), namely the letter transpositions X{ i—> Xk,Xk Xi, cyclic letter 
permutations x% i—> x<i i—► ... i—» x n i—> x\, letter inversions X{ ^ xf 1 and the 
substitutions of the types Xi i—» XiXk or Xi i—» x^Xi. The automorphisms of F 3 , 
which potentially coincide with PWWF substitutions, are necessarily positive, 
so we don’t need to consider letter inversions here. This leads to the following 
definition: 

Definition 4.1. An authentic PWWF substitution is called morphic, if it is a 
positive automorphism of F^. 
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For two different letters x,y E {a, b, c} let E x ^ y , A xy ,P xy E Aut(Ffi) denote the 
following positive automorphisms of the free group F 3 : Let z E {a, 6 , c} denote 
the third letter, respectively. 

P X y(%) ~ Vi E xy (y) = x, E xy (z^) = z, 

A X y{x) = xy, A xy {y) — y , A xy (z') = z, 

Pxy (x) = 1/X, P xy (y) = y, P X y(z') = 

Hence, a PWWF substitution <7 : {a, 6 , c}* —► {a, 6 , c}* is morphic, if it can 
be written as a composition of a finite number of letter permutations E xy and 
substitutions of the types and/or P X3/ . Now we inspect the seven conjugates 
of the Phrygian minor mode to which also the Ionian major mode belongs. The 
following proposition is thus the portrait of a very special conjugation class of 
modes. As in the two-letter case it contains bad conjugates. 

Proposition 4.2. Let a : {a, 6, c}* —► {a, 6, c}* with a(b) = ba, a (a) = ca and 
a(c) = bac denote the PWWF substitution, which is associated with the Phrygian 
minor mode. The cycle of the single letter conjugations 6a|ca||6ac —> ac|afr||ac6 —> 
ca|6a||c6a —> afr|ac||6ac —> ac|6a||ca6 —> c6|ac||a6a —> 6a|ca||6ac contains two bad 
conjugates and — accordingly—five PWWF modes. Four of these PWWF modes 
are morphic. The exceptional good, but amorphous instance, is the Ionian major 
mode. These seven conjugates together with their associated projections under 
irb^c, ^c^a(o') and 7 T a ^b(a) are listed below: 


authentic 

apotomic 

syntonic 

apo-syntonic 

transform. 

triadic mode 

projection 

projection 

projection 

type 

ba\ca\\bac 

ca\cacac 

baaa\baa 

bbcb\bbc 

morphic** 

ac\ab\\acb 

ac\acacc 

aaab\aab 

bcbb\bcb 

morphic 

ca\ba\\cba 

ca\cacca 

aaba\aba 

cbbb\cbb 

morphic 

ab\ac\\bac 

ac\accac 

abaa\baa 

bbbc\bbc 

morphic 

ba\cb\\aca 

ca\ccaca 

baab\aaa 

bbcb\bcb 

bad* 

ac\ba\ \cab 

ac\cacac 

aaba\aab 

bcbb\cbb 

good f 

cb\ac\\aba 

cc\acaca 

abaa\aba 

cbbc\bbb 

bad** 


Proof. In the left column of the table ba\ca\\bac undergoes the full cycle of letter- 
by-letter conjugations. In parallel the projections 7r^ c (cr), tt C ^ a (cr), 7r a _>fc(cr) 
run through their corresponding conjugations. By virtue of Proposition 3.8 the 
conjugations of the apotomic and the apo-syntonic projections are “in sync”, 
i.e. they start with special standard modes (marked as morphic**) and they end 
with bad modes (marked as bad**). As a consequence there are only two bad 
conjugates among the triadic modes. The generation of the four morphic modes 
- up to letter permutations - is given below: 

AbaPacPcb(b\a\\c) = A ba Pac(b\a\\bc) = A b a(b\ca\\bc) = 6a|ca||6ac 
PcaA a bPbc(c\a\\b) = P ca A ab {c\a\\cb) = P ca {c\ab\\cb) = ac\ab\\acb 
Ab a PacAcb{a\b\\c) = A ba Pac{ci\b\\cb) = A b a{ca\b\\cb) = ca|6a||c6a 
PcaAabPcb(b\a\\c) = P ca A a b(b\a\\bc) = P ca (ab\c\\bc) =afr|ac||6ac 
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The obstacle for the “good” PWWF Ionian major mode to be generated by an 
F 3 -automorphism is the co-existence of the two factors ab and ba. 

In addition to the bad* Locrian mode (with a bad syntonic projection) there 
is the bad** Dorian minor mode (with bad apotomic and apo-syntonic projec¬ 
tions), whose structural defects have been discussed in 19th-century treatises, 
such as Moritz Hauptmann’s 1853 Die Natur der Harmonik und der Metrik. 

We show now that a similar picture arises in connection with a certain family 
M of morphic PWWF modes and we conclude the article with the conjecture 
that this family actually exhausts the morphic PWWF modes entirely. It is use¬ 
ful to start the investigation of this family from the authentic standard modes, 
whose bisecting transformations then yield the associated PWWF modes. In 
this particular case, the general form of the standard morphism /, generating 
the single-divider mode /(a)|/(c) is / = G k DG 2n with n > 0 and k > 0. The 
corresponding apo-syntonic conversion of /, the standard morphism g , gener¬ 
ating the first of the two double divider modes g(b)\g(c) is g — G 2kJr2 DG n ~ x . 
Thus, we have the two sets 

T = {G k DG 2n | n > 0, k > 0} and g = {G 2n DG k \ n > 0, k > 0} 

together with the apo-syntonic conversion 6 : T —> Q, where 6{G k DG 2n ) := 
Q 2 k+ 2 jjQn-i p ur thermore, we consider reversal map, i.e. the unique anti-auto- 
morphism of rev : (G, D) —> (G, D) fixing both G and D. This map rev sends 
T to Q and vice versa. 

The following proposition specifies the commutative diagram for matrices in 
Proposition 3.7 to the elements of the sets T and Q\ 

Proposition 4.3. 9 o rev o 0 = rev. 

Proof. Although the relation is a corollary of Proposition 3.7 we give a direct 
proof here: 


0(rev(0(G k DG 2n ) = 0(rev(G 2k+2 DG n ~ x )) 

= 6>(G n-1 DG 2/c+2 ) 

= G 2n DG 2k+2 = rev{G k DG 2n ) 

The third corresponding special Sturmian morphism < 7 , generating the syn¬ 
tonic projection g(a\b) of our PWWF mode has the following form: 

g = 0(f) = 6(G k DG 2n ) = G k G k+2 DG n ~ 1 . 


Thus, the morpism g is preceded by precisely k + 2 conjugate morphisms in the 
Zarlino ordering, namely by G k+l G k+2 ~ l DG 71-1 for l = l,...,fc + 2 . For the 
corresponding incidence matrices we have: 


/k + 1 2n(k + 1) + k \ . ,, f 2k + 3 n(2k-\-3) — l\ 

M > = { 1 2» + l ) a " d M ° = M >= [ 1 „ ) 
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Proposition 4.4. Consider the authentic PWWF three-letter mode 

Vk,n •= a k ba\a k ca\\(a k baa k ca) n ~ 1 a k baa k c 

Its apotomic, apo-syntonic and syntonic projections are iTb^ c {vk,n) = /(a \c), 
K a ^b{vk,n) = g(b\c) and 7r c ^ a (v k:n ) = g(a\b), respectively. 

Proof. 

f(a\c) = G k DG 2n (a\c) = G k D{a\a 2n c ) = G k (ca\(ca) 2n c) 

= a k ca\(a k ca) 2n a k c = iTb^ c (vk,n) 

g(b\c) = G^^DG 71 - 1 ^) = G 2/c+2 I)(5|6 n - 1 c) = G 2k + 2 (cb\(cb) n ~ 1 c) 

= 5 2/c+2 c6|(6 2/c+2 c5) n_1 5 2/c+2 C = 7 T a ^ b (Vk, n )- 
g(a\b) = G k G k3 ~ 2 DG n ~ 1 (a\b) = G k G k ^ 2 D(a\a 17 ~ 1 b) = G^G^^aK^a) 71 -^), 

= G^G^^aKH 77 -^) = G k (ba k+3 1 (ba k ^ 3 ) n ~ 1 ba k3 ~ 2 ) 

= a k ba k + 3 \(a k ba k + 3 ) n ~ 1 a k ba k + 2 = 7r c ^ a (v k ^ n ), 

The subsequent proposition provides an explicit portrait of the full conjuga¬ 
tion class of the authentic PWWF mode v k ,n- 

Proposition 4.5. The table below lists all letter-by-letter conjugations of the 
authentic PWWF mode v k: n an d characterizes them as morphia, good or bad. 
The segment between the bad ** mode (with bad apotomic and apo-syntonic pro¬ 
jections) and the bad* -mode (with bad syntonic projection) is exclusively occupied 
by morphia modes. The opposite segment between the bact-mode and the bad** 
mode is exclusively occupied by good modes: 


a k ba 
a k ~ 1 ba 2 

1 h 

a^ca 

a k ~ 1 ca 2 

|| (a k baa k ca) n ~ 1 (a k ba)(a k c) 

|| ( a k ~ 1 ba 2 a fe_1 ca 2 ) n ~ 1 (a k ~ 1 ba 2 )(a k ~ 1 ca) 

morphic** 

morphic 

a k ~ l ba l+1 

a k ~ l ca l+1 

( a k ~ l ba l+1 a k ~ l ca z+1 ) n_1 {a k ~ l ba l+1 ){a k ~ l ca 1 

) morphic 

ba k+l 

a k+1 c 

| ca k+1 
a k+1 b 

|| (ba k+1 ca k + 1 ) n ~ 1 (ba k + 1 )(ca k ) 
\a k + 1 ca k+1 b) n - 1 \a k+1 c)\a k b) 

morphic 

morphic 

ca k+1 

a k+1 b 

| ba k+1 
| a k+1 c 

|| (ca k+1 ba k+1 ) n ~ 1 (ca k )(ba k+1 ) 

(a k+1 b a k+1 c)"- 1 (a k b)(a k+1 c) 

morphic 

morphic 

ba k+1 

a k ^c 

| ca k+1 
a k+1 b 

|| (ba k+1 ca k+1 ) n ~ 2 (ba k+1 ca k )(ba k+1 ca k+1 ) 
(a k + 1 ca k + 1 b) n - 2 (a k + 1 ca k b)(a k + 1 ca k+1 b) 

morphic 

morphic 

ca k+1 

a k+1 b 

| ba k+1 
| a k+1 c 

|| ( ca k+1 ba k+1 ) n ~ 2 (ca k ba k+1 )(ca k+1 ba k+1 ) 

|| (a k+1 b a k+l c) n ~ 2 (a k b a k+1 c)(a k+1 ba k+1 c) 

morphic 

morphic 

aba k 

1 h 

aca* 

|| ba k aca k (aba k aca k ) n - 1 

morphic 

ba k+1 

| ca k b 

|| (a k + 1 ca k+1 b) n - 1 a k+1 ca k + 1 

bad* 

a k+1 c 

| a k ba 

(a k ca k+1 ba) n - 1 a k ca k+1 b 

good* 

aca k 

| ba k+1 

( ca k + 1 ba k+1 ) n ~ 1 ca k + 1 ba k 

good 

ca k b 

| a k+1 c 

|| (a k + 1 ba k + 1 c) n — 1 a k + 1 ba k + 1 

bad** 
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Proof. The following calculation shows that Vk, n is morphic. The calculations 
for the other morphic modes are analogous: 

P k a P k a A ba P ac {P cb P ca ) n ~ l P cb E a Mb\\c) = P b k a P* a A ba Pac(PcbPca) n - 1 Pcb(b\a\\c) 

= PtP k ca A ba P ac {P cb P ca ) n ~\b\a\ 16c) 

= P k a P k a A ba Pac( b \o,\\(ba) n ~ 1 bc) 

= P^ a P^ a A ba P ac (b\a\\(ba) n ~ 1 bc) 

= P k a P k a A ba (b\ca\\(bca) n ~ 1 bc) 

= P ba P ca( ba \ ca \ I ( ba ca T X 6a C) 

= P k a (ba\a k ca\\(baa k ca) n ~ 1 baa k c) 

= a k ba\a k ca \| ( a k ba a k ca) n ~ l a k ba a k c 

Vk,n 

We try to write the good**-mode 7 k, n = a k+1 c\a k ba\ \(a k ca k ~^ 1 ba) n ~ 1 a k ca k ~ [ ' 1 b 
as an image f(a\b\\c) under an automorphism / = fmfm-i • • • / 1 / 0 , where the 
fi(i = 1, m) are supposed to be productions of the type A xy or P xy and where 
/o is letter permutation. First we observe that / m , the last of these morphisms, 
cannot be a production of 6 ’s or c’s: First of all, A ^ c , A c ^, P^ c and P& c are excluded 
because there are no letters c and b neighboring each other (even in the case k = 0 ). 
But furthermore, not all instances of the letter a are followed or preceded by 
either exclusively b or exclusively c, and so also the productions A ac , A a ^, P ac 
and P ac can be excluded. The only remaining possibilities are productions of the 
letter a. Among these the append-transformations A^ a and A ca both excluded, 
as the single-divider prefix a k+1 c of 7 /^ ends on c and the double-divider suffix 
(a /c ca /c+ 1 6 a) n_ 1 a /c ca /c+1 6 ends on b. For k 0 the prepend transformations P^a, 
and P ca are suitable in order to produce 7 k, n from shorter words. To be more pre¬ 
cise: P\)a and P ca commute with each other and both can be applied k times in 
any order to the triple 70 , n = ac| 6 a||(cafra) n- 1 ca 6 to produce 7 /c ?n . Here is one of 
them: 

a k + 1 c\a k ba\\(a k ca k + 1 ba) n ~ 1 a k ca k + 1 b = P k CL (a k ^ 1 c\ba\\(a k caba) n ~ 1 a k cab) 

= p ca l p ba ( ac \ ba 11 (ca 6 a) n_ 1 ca 6 )) 

A closer look at 70 , n shows that it is not an image of a shorter word-triple 
under any of the 8 transformations of the type A xy or P xy . 

The similar line of argument works for any of the good modes. For 0 < / < k 
the general form of a good mode is 

aa /e_z ca z |a /c_ ^a^a||(a /c_z ca z aa /c_z 6 a / a) n_ 1 a /c_/ ca z aa /c_ ^a^ 

= e i (nt ! (4(4(7o, n )))) 

So all the good modes are images of the mode 70 ,n- And this is the only 
possibility to generate them from a shorter triple. 

Conjecture j.6. Consider an authentic PWWF substitution a in the sense of 
Definition 3.1. The substitution a is morphic (i.e. is a positive automorphism 
of P3) iff its generated mode a(a)\a(b)\\a(c )—up to letter permutation—is an 
instance of a morphic mode in Proposition 4.5. 
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5 Conclusion 

Since the rise of the triad in its role as a governing concept in the music of 
harmonic tonality, music theorists have been in a quandary as to how to adjust 
the immemorial diatonic scale so as to be compatible with the triad’s new pri¬ 
macy. The interval of the major third challenges the status of the perfect fifth 
as a scale generator, and in the course of this competition it undermines the 
validity of other properties that are consequences of the fifth-generatedness of 
the diatonic scale, such as the well-formedness property. The concept of pair¬ 
wise well-formedness offers a reconciliation, insofar as it implements the idea 
of a coexistence of three well-formed scale structures within one parent scale. 
The letter projections mediate between the competing interpretations of a fifth- 
and a third-generated scale. The present paper offers a transformational upgrade 
to that earlier basic insight. Following the pattern of the investigation of well- 
formed modes through automorphisms of the free group it clarifies the com¬ 
binatorial behavior of all non-singular pairwise well-formed modes. For the con¬ 
crete case of the Ionian major mode ac\ba\\cab, it turns out that the underlying 
substitution is not an automorphism of the free group F 3 , which is an excep¬ 
tion in comparison to the Phrygian minor, Lydian major, Mixolydian major and 
Aeolian minor modes. This exceptional status corresponds to the musical fact 
that the species of major third ac and ca as well as those of the minor third ba 
and ab are different. The mathematical status of the Dorian minor mode as a 
bad mode has its musical counterpart in the fact that the species cbac doesn’t 
form a proper fifth. This fact in turn is reflected in the special treatment some 
nineteenth-century theorists accorded the supertonic ii harmony in major, as a 
sort of cousin to the diminished supertonic triad in minor. 
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Abstract. The paper deals with the question of homometry in the dihe¬ 
dral groups D n of order 2 n. These groups are non-commutative, lead¬ 
ing to new and challenging definitions of homometry, as compared to 
the well-known case of homometry in the commutative group Z n . We 
give here a musical interpretation of homometry in D 12 using the well- 
known neo-Riemannian groups, some results on a complete enumeration 
of homometric sets for small values of n, and some properties disclosing 
the deep links between homometry in Z n and homometry in D n . 


Keywords: Homometry • Interval vector • T//-group • PLR -group 
Dihedral groups • Semi-direct product • Discrete Fourier transform 


1 Introduction 

The concept of homometry first appeared in the 1930s in the field of cristallog- 
raphy. The question was to determine the structure of a crystal from its X-ray 
diffraction pattern. This type of measurement is directly related to the intensity 
of the Fourier transform of the crystallographic structure, but the phase infor¬ 
mation is lost in the process. The problem was therefore to know if a complete 
reconstruction of the structure of the crystal was possible. This problem later 
found applications in various fields, such as music theory where the question 
was to characterize a set of notes (a chord, a melody) from the intervals that 
compose it. Homometry has been studied through different approaches: group 
theory [1], Fourier transform [2], distribution theory [3,4], etc. and is an open 
field of research. 

The classical way to model the n-tone equal temperament in musical set the¬ 
ory is to consider the cyclic group Z n = Z/nZ as the set of pitch classes, for 
instance Z 12 = {0 = C, 1 = Ctt,..., 11 = B}. Following David Lewin’s construc¬ 
tions described and systematized in [5], for any subsets A and B in Z n , one can 
consider the interval function ifunc(A, B) whose components are 

ifunc(A, B)(k) = #{(a, b) e A x B \ b — a = k} 
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for k G Z n , and the interval vector iv(A) whose components are defined by 
i v(A)(k) = ifunc(A, A)(fc). Two sets A and B in Z n are homometric if they 
have the same interval vector (iv(A) = iv(P)), meaning that they contain the 
same set of intervals. This is traditionally called Z-relation and was mainly 
presented by Forte [ 6 ] . In this paper we will only use the word homometry which 
refers to the same concept. The actions of transposition and inversion clearly do 
not change the interval vector of a set, hence two homometric sets which do not 
belong to the same set class modulo transpositions and inversions will be called 
non-trivial homometric sets. A well-known example of a non-trivial homometric 
pair in Z 12 is ({C, P b , E, G b }, {C, P b , P b , G}), for which the interval vector is 
[1,1,1,1,1,2,1,1,1,1,1]. More detailed explanations can be found in [3,7]. 

The concept of group action of (Z n , +) on itself by translation (where n G Z n 
acts on a G Z n by n+a ) can also be used to define the interval vector. Thus we call 
interval between a and 6 , written int(a, b), the element n such that n + a = b, 
and iv(A)(k) = (t{(a, b) G A 2 | int(a, b) = k} for k G Z n . In a more general 
setting and following Lewin’s idea of generalized intervals [5], one can consider 
homometry in the context of any simply transitive group action. For instance it 
is well-known that the T/I -group and the neo-Riemannian PLR- group both act 
simply transitively on the set S of major and minor triads. We recall that the 
T//-group is generated by the transpositions T p (x) = p + x and the inversions 
I p (x ) = T p Io(x) i.e. I p (x ) = —x + p, for p G Z n . The PLR- group is generated 
by P, L, and R which correspond respectively to the operations parallel, leading 
tone exchange and relative (see [ 8 ] for more details). The interval between two 
triads 81 and 82 is the unique element of the group sending si to 82 for the chosen 
group action. If we use upper-case letters for major triads and lower-case letters 
for minor triads (G is G-major and c is G-minor) we obtain for instance in the 
context of the T/I- group: int t//( c , F b ) = F 10 and in the context of PLF-group: 
mt plr(c, F b ) = R. For a given Generalized Interval System (5, IVLS, int), we 
can thus define the interval vector of a subset A of S' as 

i v(A)(k) = §{(a,b) G A 2 | int (a,b) = kj 

for k in IVLS, and two subsets of S will be called homo metric if they have the 
same interval vector. As an example of homometry for both the actions of the 
T/J-group and the PLP-group, consider the pair of sets {c, P b , P b , e, a b } and 
{c, P b ,e,F,a b }. Figure 1 shows some of the intervals between the elements of 
these sets, for the action of the T//-group and the PTP-group respectively. It 
can clearly be seen that the same intervals {T 2 , Z 4 , Z 4 , Tg, Jo, J 2 , J 4 , R: ^ 8 , ^ 10 } ar e 
present in both sets, hence they have the same interval vector (all other intervals 
can be deduced by composition and/or inversion). Similarly, the same intervals 
{F, PL, PL, PL, LPL, LPP, PLP, LPP, PPP, PRLR} are present in both sets 
leading to an identical conclusion for the interval vector. 

It has been showed in [9] that the actions of the T/J-group and of the PLR- 
group on the set of major and minor triads can be understood respectively as 
the left and the right actions of the dihedral group P 12 on this set. Moreover it 
is well-known that P 12 is the semi-direct product (Zi 2 ,+) xi (Z 2 , x). This lead 
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Fig. 1 . Intervals in the T/I -group (top) and the PLR-group (bottom) for the two 
homometric sets {c, D b , e, a b } and {c, E b , e, F : a b }. 


us to the general topic of this paper, namely the study of homometry in the 
non-commutative dihedral groups of order 2n, or equivalently the semi-direct 
products (Z n ,+) xi (Z 2 , x). 

This paper is divided into four parts. We first recall the definition of D n as 
a semi-direct product and we define homometry for the right and left actions of 
this group. The link with the well-known musical case n = 12 will be explained 
in the second part. In the third part we give the equations that characterize 
homometry and some results concerning enumeration. Finally we define in the 
last part the concept of lift which bridges homometry in Z n and homometry in 
D n , and we give our main results using the discrete Fourier transform. 

2 The Dihedral Group D n as a Semi-direct Product 

The dihedral group D n is the group of symmetries of a regular n- gon, using 
rotations and reflections. It can be expressed as the semi-direct product (Z n , +) x 
(Z 2 , x), where Z 2 = {±1}. Its elements are the pairs (&, e) where k h Z n and 
e = ±1, with the identity element being (0,1), and multiplication between two 
elements being given by the equation 

(k, e)(l, rj) = (k + el, erj). ( 1 ) 

As a non-commutative group D n acts on itself by right or left multiplication: 
thus acting on (&, e) on the right leads us to Eq. (1), whereas (l,rf) acting 
on (&, e) on the left leads us to 

(l,r])(k,e) = (l+ r]k,T]e). (2) 
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This allows us to define the intervals between any two pairs (£q,ei) and 
(£2,62) of D n . The left interval is the unique element (Z, 77) in D n such that 
(/, 77) (fci, ei) = (&2 1 ^2)? whereas the right interval is the unique element (Z, 77) such 
that (fci, ei)(Z, 77) = (^2,62). Thus we obtain two functions 4 nt: D n x £>„ -> Dn 
and ' int: D n x D n —► D n , called interval functions and defined as 

'int((fci, ei), (fc 2 , £2)) = (fa ~ e 2 /e 1 k 1 ,e 2 /e 1 ), ( 3 ) 

r int((fci, ei), (fc 2 , e 2 )) = ((fa - k 1 )/e 1 ,e 2 /e 1 ). ( 4 ) 

The left interval vector l iv(A) and right interval vector 'iv(T) of a set A in 
D n are then defined as 

l ’ r W(A)((l,t))) = tt{((fei, ei), (fa, e 2 )) G A 2 | l ’ r int((k 1 ,e 1 ),(fa,e 2 )) = ( l,r 7)} 

for (/, 77) G D n . We say that two sets in D n are homometric for the left (resp. for 
the right) action (or simply left-/right-homometric) if they have the same left 
(resp. right) interval vector. It is easy to see that any left (resp. right) action 
preserves the right (resp. left) intervals. As a consequence, two left-homometric 
(resp. right-homometric) sets will be called non trivially homometric , if they are 
not related by right (resp. left) translation. In the rest of the paper ‘homometric’ 
will mean ‘non-trivially homo metric’. 

3 Link with the T/I and the PLP-groups in the Case 
n = 12 


As mentioned in [ 9 ], the actions of the T/I -group and the PLR -group on the 
set of major and minor triads can be considered as the left and right actions of 
D12 on 5 , but also as the actions of D12 on itself. To understand why, we use a 
(non canonical) bijection between D\ 2 and S. The element (s, + 1 ) of D 12 will be 
identified to the major triad whose root is s G Z12, whereas the element (5, — 1 ) 
of D12 will be considered as the minor triad whose root is s G Z12. For instance 
( 0 , 1 ) corresponds to C, ( 0 , — 1 ) corresponds to c, ( 8 , — 1 ) corresponds to a b , and 
so on. The set {c, £ lb , e, a b } given in the introduction can be then identified 
with the set {( 0 , - 1 ), ( 1 , 1 ), ( 3 , 1 ), ( 4 , - 1 ), ( 8 , - 1 )}. 

Notice that this bijection with the elements of D12 is not limited to the set 
of major and minor triads, but may be applied on any set on which D12 acts 
simply transitively, as mentioned in the introduction. For instance, Lewin used 
an action of D12 on set-class 5-4 in his analysis of Stockhausen’s Klavierstiick 
III [ 10 ]. This is also true for homometry in D n in general. Nevertheless, we have 
decided to focus in this paper on the actions of D\ 2 on major and minor triads 
as it belongs to the most musically relevant examples of group actions. 

If we consider the left action of D\ 2 on itself we have the following isomor¬ 
phism between D\ 2 (as the acting group) and the T//-group: (p, + 1 ) corresponds 
to T p and (p, — 1) corresponds to Ip-5. For instance the image of C by T7 is cal¬ 
culated in D12 as the element corresponding to ( 7 , 1 ) ( 0 , 1 ) = ( 7 , 1 ), i.e. the major 
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chord G. Similarly the image of C by I 2 is calculated to be the element corre¬ 
sponding to (7, — 1) (0,1) = (7, —1), i.e. the minor chord g. 

If we consider the right action of D 12 on itself we have the following bijection 
between D 12 and the PLR-group: P corresponds to (0,-1), L corresponds to 
(4,-1) and R corresponds to (9,-1). For instance P{C) is calculated as the 
element corresponding to ( 0 , 1 )( 0 , — 1 ) = ( 0 ,- 1 ) which is c, L(d b ) corresponds 
to (1, —1)(4, —1) = (9,1) which is A , and R(F) to (4,1)(9, —1) = (1, —1) which 
is d b . We obtain on Fig. 2 a new version of Fig. 1 with the left and the right 
intervals in D 12 . 




Fig. 2 . Left (top) and right (bottom) intervals in D12 for the two sets {c, D b , F b , e, a b } 
and {c, F b , e, F, a b }. 

Since we are interested in the general case of homometry in D n , we will work 
from now on using the general point of view of semi-direct products and their 
group elements (Z, 77 ). 

4 Homometry in D n : Formulas and Enumeration 

In order to avoid confusions we will adopt the notation L A? for subsets in D n , and 
the notation ‘A’ for subsets in Z n . Given the form of the group elements of D n 
as pairs (/, 77 ), a set A G D n is the disjoint union of two (possibly empty) subsets 
A + and A- , with A + = {(/,??) G A \ rj = 1}, and A- = {(/,??) G A \ r] = —1}. 
For instance in D 12 the set 


A = {c, D\ E\ e, a b } = {(0, -1), (1,1), (3,1), (4, -1), ( 8 , -1)} 
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given in the introduction is the union of the major chords {E b ,E b } = 
{(1,1), (3,1)} and the minor chords {c, e,a b } = {(0, —1), (4, —1), ( 8 , —1)}. Let 
7 r : D n —> Z n be the projection on the first factor, i.e. 7 r((Z, 77 )) = l. We 
define the sets A+ and A _ as A+ = 7 t(* 4 + ) = { 7 r((Z, 77 )) | (l,rf) G ^4+}, 
and 4._ = tt(A-) = { 7 r((Z, 77 )) | (l,rj) G *4_}. In the example above, we 
have A+ = {1,3} C Z 12 and A- = {0,4,8} C Z 12 . Remark that we have 
tt(A) = 7 t(* 4 + ) U tt(A-) = A+ U A_. If there is no ambiguity we will call 
A 7r(A). 

The purpose of the following theorem is to give a characterization of homom¬ 
etry in D n using iv and ifunc. 

Theorem 1 . Two sets A and B in D n are homometric for the right action if 
and only if the following two equations hold: 

f iv(A+) + iv(A_) = iv{B + ) + iv{B_) , , 

{ ifunc(A+, A_) = i/imc(E + , B_) ' ' 

Two sets A and B in D n are homometric for the left action if and only if the 
following two equations hold: 

I iv(A+) + iv(A-) = iv{B+) + iv(B-) . . 

{ ifunc(IoA+, A-) = i/unc(/oE + , E_) ^ ' 

Proof Let A , B be two right homometric sets in D n . Let us recall that 

r int((/ci, ei), (k 2 , e 2 )) = ((k 2 - k 1 )/e 1 ,e 2 /e 1 ) (7) 

for (fci, ei) and (k 2 ,e 2 ) in A. We then have to consider two cases, corresponding 
to the two equations of (5). 

In the first case, e 2 /ei = 1, i.e. ei = e 2 . Then, 

- either e\ = 1 = e 2 in which case we have r int((Aq, ei), (k 2 ,e 2 )) = (. k 2 — fci, 1) 
for ki and k 2 in A+, meaning that we have to calculate iv(A + ) to obtain all 
the intervals of that type, or 

- c\ = -1 = e 2 then r int((/ci, ei), (fc 2 , e 2 )) = (fci - fc 2 ,l) for fci,fc 2 in A-, 
meaning that we have to calculate iv(4_) to obtain all the intervals of that 
type. 

We then must have iv(A+) + iv(^4_) = iv(E + ) + iv(E_). 

In the second case, e 2 /ei = —1, i.e. e\ = —e 2 . Then, 

- either e\ = 1, e 2 = —1, thus we have r int((fci, ei), (k 2: e 2 )) = (k 2 — fci, —1) for 
fci G A+ and k 2 G A-, meaning that we have to calculate ifunc (A+, A_) to 
obtain all the intervals of that type, or 

- e\ = — 1,62 = 1, then r int((fci, ei), (fc 2 , 62 )) = (fci — k 2 , 1) for fci G A- and 
k 2 G A+, meaning that we have to calculate again ifunc (A+, A_) to obtain 
all the intervals of that type. 
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We then must have ifunc(H + ,H_) = ifunc (F> + ,F>_), which leads to the two 
equations of (5). Reciprocally if two sets verify (5), then they have the same 
right interval vector. This works similarly with left intervals, the difference being 
that when 62/61 = —1, we obtain z int((Aq, ei), (& 2 , £ 2 )) = (&i + —1), and we 

thus calculate ifunc (Jo , A _). □ 

We can notice that the first equations in both (5) and (6) are identical, but the 
second one shows an important difference. For left homometry it is symmetric 
between A + and (and between B+ and F>_), whereas it is not for right 
homometry, due to the fact that ifunc(/oH + , A-) = ifunc(/oH_, A + ). 

We now describe a simple way to build left homometric sets from right 
homometric sets and reciprocally, using the inversion operator I. Since we have 
(fc, l) -1 = (—fc, 1) and (fc, — l) -1 = (fc, —1) for any fc in Z n , it is easy to cal¬ 
culate 1(A) for a set A in D n by taking the inverse of A + and keeping 
unchanged. For example, if A = {(0, —1), (1,1), (3,1), (4, —1), (8, —1)} G £> 12 , 
we obtain 1(A) = {(0, —1), (11,1), (9,1), (4, —1), (8, —1)}. This leads to the fol¬ 
lowing result, the proof of which we omit here. 

Proposition 2. Let A and B be two sets in D n . A and B are non-trivially right 
homometric if and only if 1(A) and 1(B) are non-trivially left homometric. 

Corollary 3. For all n G N, the number of right homometric sets in D n is 
equal to the number of left homometric sets. Besides, we can deduce all the left 
homometric sets from the right homometric sets (and reciprocally) using the 
inversion I. 

This result is useful for the problem of enumerating left- and right- 
homometric sets in D n (which, similarly to Z n , is an open problem [7]) since 
the calculation needs only to be done for right (or left) homometric sets. How¬ 
ever homometries for the right and for the left actions work very differently 
concerning a specific point, which the following proposition shows. 

Proposition 4. If A and B are right homometric in D n , then their projection 
A = n(A) and B = 7 t(B) are homometric in Z n . Besides, if the homometry is 
trivial in D n , the homometry is also trivial between the projections in Z n . 

Proof. We just prove the first part of the proposition. If A and B are right 
homometric in D n , we have 

i v(A) = iv(A + ) + iv(A_) + ifunc(A + , AJ) + ifunc (A_,A + ) 

= i v(B+) + iv(B-) + ifunc(F> + , B-) + ifunc (B-,B+) 

= MB) □ 

In other words, homometry for the right action in D n “implies” homometry 
in Z n . However left homometry does not, as the pair of sets 


({(0,1), (1, -1), (2,1), (5, -1), (7, -1)}, {(0,1), (1, -1), (6,1), (7, -1), (8,1)}) 
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in D 20 shows. These sets are left homometric but their projection {0,1,2, 5, 7} 
and {0,1, 6, 7, 8} are not homometric in Ziq. The Proposition 4 raises the ques¬ 
tion of whether left- or right-homometric sets in D n can be found from homo¬ 
metric sets in Z n . In other words, can we split two homo metric sets A and B in 
Z n into subsets and (H + ,L>_) such that the corresponding sets in D n 

are homometric? This question will be considered in the following section with 
the definition of the concept of lift. 

Before moving to this section we give some computational results concerning 
the enumeration of homometric sets in D n . By a brute-force approach, a complete 
enumeration of such sets was performed, with cardinality equal to 4, 5, or 6 for 
n < 18, and with cardinality equal to 7 for n < 15. The first homometric pair 
appears for n = 8, p = 4 (a direct result of Proposition 4 and known results 
about homometry in Z n ). Homometric t-uples with t > 2 also exist, the first 
triple appearing for n = 12 and p = 5 (interestingly, the first homometric triple 
in 7L n only appears for n — 16 and p = 6). The first simultaneously right- and 
left- homometric pair appears for n = 8 and p = 5. The Table 1 gives a complete 
list of left and right homometric pairs and triples written in musical form for 
n = 12 with p = 4 and p = 5. Notice that the first two pairs with p = 5 in this 
are both left- and right-homometric. 


Table 1. Left- and right-homometric sets of cardinality p in D12, given in musical 
form. 


n = 12 

Type 

Homometric sets for the action of the 
T/I- group (left action) 

Homometric sets for the action of 
the PLR- group (right action) 

p = 4 

Pairs 

{C,d,e b ,G b } & {C,c,g b ,A} 
{C,d b ,e,G b } k {C,d\g,A} 

{C,d, f,G b } k {C,d,a b ,A} 

{C, c, e b ,G b } k {C,c,E b ,g b } 
{C,d b ,e,G b } k {C,d b ,E b ,g} 

{C,d, f, G b } k {C, d, E b ,a b } 

p = 5 

Pairs 

{C, c, d,E,A b } & {C,d,e, E, A b } 
{C,d b ,e b ,E,A b } & {C,e b ,E,f,A b } 
{C,c,d b ,f,G b } & {C,c,g b ,G,B} 
{C,c,e b , f,G b } & {C,c, E b , g b , B} 
{C,c,D b ,g b ,A b } & {C,c,g b ,G,A b } 
{C,d b ,D b ,g,A b } & {C,d b ,g,G,A b } 
{C,D b ,d,a b ,A b } & {C,d,G,a b ,A b } 
{C,D b ,e b ,A b ,a} & {C,e b ,G, A b ,a} 

{C, c, d,E,A b } k {C,d,e, E, A b } 
{C,d b ,e b ,E,A b } k {C, e b , E, /, A b } 
{C,c,d b ,f,G b } k {C, c, D b ,F,g b } 
{C,c,e,f,G b } k{C,c,D b ,g b ,A b } 
{C,c,E,F,g b } k {C, c, E,g b , B} 

{C, d b ,E, F,g} k {C,d b ,E,g,B} 
{C,d, E, F,a b } k { C,d,E,a b ,B} 

{C, e b , E, F, a} k {C,e b ,E,a,B} 


Triples 

{C,c,d,e b ,G b } & {C,c,D,g b ,B b } 
k {C,c,g b ,A b ,B b } 

{C,d b ,e b ,f,G b } k {C,d b ,D,g,B b } 
k {C,d b ,g,A b ,B b } 

{C,c,d,e,G b } k {C,c,D,E,g b } 
k {C,c,D,g b ,B b } 

{C,d b ,e b ,f,G b } k {C,d b ,D,E,g} 
k {C,d b ,D,g,B b } 










46 


G. Genuys and A. Popoff 


5 The Concept of Lift — Using the Fourier Transform 


We begin this section with a definition motivated by Proposition 4. The notation 
V(E) corresponds to the power set of the set E. 

Definition 5. A lift is an application l : V(T> n ) —> V(D n ) such that tt o l = id. 
We call lift of a set A E Z n for the lift l, the set 1(A). 

The question raised in the previous Section can then be formulated as follows: 
given two homometric sets A and B in Z n , is there a lift l such that 1(A) and 
1(B) are left- or right-homometric in D n l 

We use the Fourier transform to express these conditions, since it provides a 
very convenient way to deal with the functions ifunc and iv for subsets in Z n , 
as explained in the work of Amiot [2] . Let us recall that for A and B two subsets 
in Z n we have for t G Z n 

ifunc(A, B){t) = La*1b(<) = ^ lA(k)t B (t + k) (8) 

kez n 


If we apply the Fourier transform (notated as Ta •= E(1a) • t i—> 

^ZkeA e -2z7r/ct / n ) to this convolution product, we obtain the classical result for 


t e z n 


jF(ifunc(A, B))(t) = E-A(t)E B (t) (9) 


As iv(A) = ifunc(A, A) and E-a(1) = Ea(^) we deduce from Eq. (9) that 
jF(iv(A)) = \J~a \ 2 and we get the following well-known characterization of 
homometry, A and B being two subsets of Z n . 


A is homometric with B \Ea\ = \Eb\ (10) 


The use of the Fourier transform gives a new formulation of Theorem 1. 


Theorem 6. Two sets A and B in D n are homometric for the right action if 
and only if the two following equations hold: 


\\£a+\ 2 + \FaJ^_ = \Tb + I 2 + I T B _ I 2 
1 T a+ T a _ = T B+ T B _ 1 > 

Two sets A and B in D n are homometric for the left action if and only if the 
two following equations hold: 

( \?a+ ? + \?A- 1 2 = \Fb+ I 2 + \?b- ? 


Recall that we want to decompose each set of a homometric pair in Z n into 
two subsets in order to lift them in D n . The following proposition gives a specific 
characterization of homometry in Z n for such a decomposition, using the Fourier 
transform. 
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Proposition 7. Let A and B be two sets in Z n such that A = A\ U A 2 and 
B = B\ U £>2 • A and B are homometric if and only if 

\d~ Ai | 2 + | Ta 2 1 2 + 27 Ze(J- a 2 ) = \Tb± 1 2 + \Tb 2 | 2 + 21Ze(J r B 1 d r B 2 ) (13) 

Proof We use Eq. (10) and the fact that 

\Ta\ 2 — \*FAi A Ta 2 1 2 = I^Ai | 2 + 1 2 + 21Ze(tF a 1 ^ 7 a 2 ) □ 

We now provide with the main result of this paper which solves the question 
of lift in a special case. 

Theorem 8. Let A and B be two homometric sets in 7L n such that A = A\ U A 2 
and B = BiU B 2 with iv{Af) =iv(B 1 ) and iv(A 2 ) =iv{B 2 ). We can always lift 
A and B into (non trivial) right homometric sets in D n . 

Proof Let A and B be two homometric subsets verifying the conditions of the 
theorem. We know from Proposition 7 that 

\J~A\ | 2 A \Pa 2 | 2 A 21Ze(J r A 1 d r A 2 ) = \PBi | 2 + \Tb 2 \ 2 A 21Ze(J r B 1 d r B 2 ) 

As iv(Ai) =iv(Bi) and iv(^) =iv(^) we deduce 

\F Al \ — \J~bA and \Ta 2 \ — \J~b 2 \ (14) 

so we get 

Lle{T a^a^) = hZe(P Bl 3 r B 2 ) 

We remark also that \Ta x J~a 2 \ — \Pb x Pb 2 \ thanks to (14). We obtain finally 
the two following equations: 

f He(T a^F_a 2 ) = Ue{^B 1 Fb 2 ) /,« 

\\T Ai T a ,\^\T Bi Tb 2 \ 1 j 

These equations are of the form 7 Ze(z) = TZe(z') and \z\ = \z'\, which implies 
z = z' or z = z' i.e. 


Fa 1 Fa 2 = Fb x Fb 2 
or J r A 1 d 7 A 2 = b 1 b 2 

In the first case (Ta a J~a 2 — d r B 1 d~B 2 ) if we choose A+ = A 2 , A_ = A 1 , 
B+ = B 2 and B_ = B\ we get right homometric sets in the dihedral group since 
(11) is verified. In the second case {Pa 1 A : a 2 = Tb^b^) if we choose A + = A 2 , 
A_ = Ai, B+ = Bi and B_ = B 2 we get also right homometric sets. Thanks to 
Proposition 4 we know that this homometry is not trivial. □ 

This result proves not only the existence of right homometric lifts but gives 
also a practical way to build these lifts, as shown in the example below. We finish 
with two interesting corollaries of Theorem 8. 
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Corollary 9. In (n >2), we can always lift homometric sets with cardi¬ 
nality equal to 4 into right homometric sets in D± n . 

Proof. Rosenblatt [11] proved that if A and B are homo metric in Z 4 n with n > 2 
and j ](A) = §(B) = 4 then there exists a G {1, 2,..., n — 1} such that 

A = {0, a, a + n, 2 n} and B = {0, a, n, 2n + a} (16) 

If we choose Ai = { 0 , 2 n}, A 2 = {a, a + n}, F>i = {a, a + 2 n} and B 2 = { 0 , n} we 
are in the situation of Theorem 8 then we can lift these sets into right homometric 
sets in D^ n . It is easy to verify that we have Ta x IFa 2 = thus we know 

from Theorem 8 that we have to choose A+ = A 2 , A_ = A 1 , B + = F>i and 
B_ = .82 or equivalently A + = Ai, A_ = A 2 , B + = 5 2 and B_ = B\. □ 

Corollary 10. We can lift all the homometric sets in Z 12 into right homometric 
sets in D\ 2 . 

Proof Goyette classifies the homometric sets in Z n in four types [12]. This clas¬ 
sification is based on the existence of cyclic subsets contained in the sets we 
consider. Goyette claims that homometric sets in Z 12 are only of type 1 and 2 , 
which satisfy the conditions of Theorem 8 . □ 

In order to give an application of these two corollaries and an explicit 
construction of lifts, we will consider the example mentioned in the intro¬ 
duction based on the famous “all interval tetrachords” Si = {0,1,4,6} and 
S 2 = {0,1, 3, 7} in Z 12 . These sets are homometric of the form of Eq. (16) with 
a = 1 (here n = 3). From the proofs of Corollary 9 and Theorem 8 we know that 
if we choose A + = {0, 6 }, A_ = {1,4}, B+ = {0, 3} and B_ — {1, 7} we can lift 
S 1 and S 2 in D 12 into the two right homometric sets 


5, = {(0,1), (1,-1), (4,-1), (6,1)} 

<S 2 = {(0,1), (1,-1), (3,1), (7,-1)} 

From a musical point of view the two homometric melodies Si = 
{C, F, G b } and S 2 = {C, G} lift into the two right-homometric chord 

sequences Si = {C, d b ,e,G b } and S 2 = {G, , g}. 

Corollary 10 is interesting when considering musical applications. Every 
homometric melody in Z 12 can be transformed into right-homometric chord pro¬ 
gressions with identical roots. 

6 Conclusion 

The present paper is a first attempt to study homometry in the dihedral groups, 
which has the specificity to be a non-commutative and an interesting group for 
musical applications. In the case n = 12 we interpreted it as homometry between 
sets of major and minor triads (which can be for instance chord progressions), 
the left intervals being elements of the T/I- group, and the right intervals being 
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elements of the PLR- group. As already mentioned it is however possible to 
replace triads by any subsets on which D n acts simply transitively, which is 
interesting for tonal (but also atonal) music since we can use seventh chords or 
more generally chords with k notes (k > 3). 

We showed that there are some similarities between homometry for the right 
and for the left actions, and deep links between homometry in D n and homom¬ 
etry in Z n : the formulations require the same functions (namely ifunc and iv), 
right-homometry in D n implies homometry in Z n , and conversely in some cases 
we can build right homometric sets in D n from homometric sets in Z n . However 
there are still open questions: 

- A complete enumeration of homometric sets for small n and p based on a 
brute-force approach was performed, but the general problem of the possibil¬ 
ity of an efficient enumeration is open; 

- Can we lift homo metric sets in Z n into left-homometric sets in D n , or into 
both right-and left-homometric sets? 

- We can use a similar approach with the time-spans group (cf. [5]), which is 
the semi-direct product (M,+) xi (M+,.), for specific cases but the general 
question of homometry is still open in this group. 
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Abstract. A 20-note scale is revisited (from Balzano and Zweifel) and 
endowed with a version of Mazzola’s theory of modulation based on 
the symmetry group of the scale. Mazzola’s theory has been applied 
also by Muzzulini in the context of the usual 12-note equally tempered 
chromatic scale. A modulation for a 7 note exotic scale, based on this 
model, is presented in Sect. 4 to exemplify the algorithm by which the 
modulation quanta are computed, that is, the sets of notes that permit 
the calculation of the pivot progressions that lead from one scale to 
another. Then, the modulation model based on symmetries is applied 
to a 11 note diatonic scale, immersed within the 20 note scale, which 
shows the viability of the symmetry model for this microtonal case. This 
work is based on the premise that musical expression has an underlying 
mathematical structure. 


Keywords: Modulation by symmetries • Group theory 
Microtonal scales • Microtonal music • Modulation quanta 


1 Introduction 

In 1980, Balzano [1] used group theory to describe the usual 12-note chromatic 
scale as well as some microtonal scales. In 1996, Zweifel [9] used Balzano’s meth¬ 
ods to study with greater detail some of those scales, describing their harmonic 
structure as well. On the other hand, in 1985-1990 Mazzola [3,4] developed his 
theory of modulation by symmetries. In 1995 Muzzulini [7] applied Mazzola’s 
theory to 7-note subsets (scales) contained within the 12-note chromatic scale; 
among other scales, he applied the theory to the major and minor scales. 

The central result of the present work is a version, in a microtonal context, 
of Mazzola’s theory of modulation by symmetries. To be more specific, we apply 
the theory to an 11-note diatonic scale, immersed in a 20-note chromatic scale. 
Such scale was studied by Zweifel [9] , based on the group theoretical properties 
brought to light by Balzano [1]. To the best of our knowledge, Mazzola’s theory 
of modulation by symmetries has not been applied to microtonal scales before. 
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We present a table with complete information on how to modulate to most keys 
in this microtonal context. 

In Sect. 2 we recall, from Clough and Myerson [2], that the diatonic major 
scale can be characterised as a subset of 7 notes, contained in a set of 12 notes (the 
chromatic scale), and fulfills some properties, namely: cardinality equals variety 
(CV), structure determines multiplicity (SM) and Myhill’s property (MP). One 
can generalise these properties in order to detect the existence of microtonal 
scales that possess them. Using language from Clough and Myerson [2], a gener¬ 
alised chromatic scale is introduced as a division of the octave in c equal parts. 
By taking a subset of size d from this generalised chromatic scale, one can form 
a d-note scale, immersed in the c-note scale, which is called a generalised dia¬ 
tonic scale. Specifically, we present the construction of an 11-note diatonic scale 
immersed within a 20-note chromatic scale. 

In a similar manner, in Sect. 3, following Balzano and Zweifel’s approach, 
we generalise concepts that, in their origin, only applied to the division of the 
octave in twelve equal parts. We say that a microtonal scale that shares some 
characteristic properties of the usual major scale, is a microtonal diatonic scale , 
and we also refer to the corresponding microtonal chromatic scale. In the same 
section we relate the harmonic structure of microtonal scales to group theory. 

In Sect. 4 we present Mazzola and Muzzulini’s model of modulation by sym¬ 
metries, for 7Z 12 and in the context of equal temperament (it is necessary to 
highlight this point, given that Mazzola’s model also contemplates other tuning 
systems). Here we illustrate the algorithm that appears in [7] by means of an 
example with an exotic 7 note scale. This algorithm describes how to compute 
quanta of modulation which are, in essence, the sets of notes from which the 
chords that permit a smooth transition from one scale to another, are taken. 

Finally, in the concluding section, we present a version of Mazzola’s theory of 
modulation by symmetries for the aforementioned 11-note microtonal diatonic 
scale. 

2 Construction of the Diatonic 11-Note Microtonal Scale 

In this section we will very briefly recall Clough and Myerson’s approach in [2]. 
We don’t describe their approach thoroughly, since our objective here is only to 
highlight the fact that the main scale studied in this article, a diatonic 11-note 
scale immersed in a chromatic 20-note scale, complies with the conditions and 
properties studied in [2] . For a complete account of results concerning the study 
of diatonic microtonal scales under this approach, please refer to [2]. 

Clough and Myerson [2] study certain mathematical properties of the usual 
7-note diatonic major scale immersed within the chromatic 12-note scale. These 
properties are: cardinality equals variety (CV), structure determines multiplicity 
(SM) and Myhill’s property (MP). Clough and Myerson generalise these prop¬ 
erties to microtonal diatonic scales immersed in arbitrary-size chromatic scales. 
This amounts to say that, instead of the usual 12-note chromatic scale, we con¬ 
sider a set of c notes, which we call a chromatic scale. From this chromatic scale, 
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we take a d-note subset, which we call a diatonic scale. Chromatic notes are 
noted Co, Ci, C2,..., C c _i, whereas diatonic notes are noted: D2, • • •, D&. 

Note that chromatic notes correspond to elements of the cyclic group Z c . 

The shortest way to synthesise Clough and Myerson’s results is by way of 
example. We build a scale with parameters c = 20 and d = 11. That is, a diatonic 
11-note scale, immersed within a chromatic 20-note scale. These two integers are 
co-primes, so we may apply the following construction theorem, adapted from [2]: 

Theorem 1 (Clough and Myerson). Given two integers c and d such that 

(c, d) = 1, define = [^£] (mod c), where k = 0, db 1, ±2, ... . Then inte¬ 

gers ak are the positions (within the c-note chromatic scale) of the d notes which 
form a diatonic scale. 

Now we apply this theorem to the construction of the aforementioned diatonic 
11-note scale. Let k take values from 0 to 11, then we get the following values 
for : 

0 20 40 60 80 100 120 140 160 180 200 220 

n’ n’ IT IT IT’ TP TP TP TP TP TP TP ^ ' 

By taking the integer part of these numbers (mod 20), we get the sequence: 

0,1,3,5,7,9,10,12,14,16,18,0. (2) 

These integer numbers represent the positions of the 11 diatonic notes within 
the 20-note chromatic scale. In analogy with the C major scale, it looks as if 
the scale was written from 5 to B, since there is a one-chromatic-note distance 
from 0 to 1, so the note 0 would be similar to a leading note. That is, in order 
to write the scale from C to C (i.e. with the leading note as the last one in the 
scale), note “1” should be the first note in the scale, thus renamed “0”; and in 
general all notes’ positions should be displaced by 1. The scale is then re-written 
as follows: 

0,2,4,6,8,9,11,13,15,17,19,0. (3) 

The scale built by using this theorem has the general properties studied 
by Clough and Myerson: Cardinality equals Variety (CV), Structure determines 
Multiplicity (SM), MyhilTs Property (MP), and is a reduced scale. The meaning 
and consequences of these properties can be stated in a condensed form: 

1. Between two consecutive diatonic notes there is either one chromatic note, or 
none at all (that is, the diatonic interval of a second has only two possible 
chromatic sizes); and 

2. There is a generalised circle of fifths, which in the case of the 11-note diatonic 
scale is a circle of sevenths , since such generalised fifth is the chromatic note 
11, which is a diatonic seventh from the note 0. 

By defining a tone as a distance two chromatic notes apart, and a semi-tone 
as a distance one chromatic note apart, the distances between the notes of this 
diatonic scale form the following structure: 


TTTTS T 

s -v-' 


TTTTS , 


(4) 
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which is very similar to the structure of the usual major scale: 


TT S T TT S, 


(5) 


in the sense that both scales consist of a structure that appears repeated at a 
distance of a whole tone. Such structure is, in the case of the usual major scale, 
the major tetrachord T T S, and for the 11-note scale, a hexachord , which we 
may interpret as a generalised major tetrachord : T T T T S. The structure of 
the usual major scale, written by tetrachords, is: 


(6) 


0,2,4,5, 7,9,11,0, 



whereas the structure of the 11-note diatonic scale, written by hexachords, is: 


(7) 


0,2,4,6,8,9, 11,13,15,17,19,0 . 

v.__✓ s__✓ 


3 Group Theory and the Harmonic Structure 
of Microtonal Scales 

In this section we present some results by Balzano [1] and Zweifel [9]. Since we 
are particularly interested in the scale identified with ^ 20 , here we only give a 
short account of this topic. For a complete presentation, please refer to [1,6,9]. 

The fundamental fact is that the usual chromatic scale may be represented 
as the cyclic group 77 In fact, it may be represented as any one of three 
isomorphic groups: 77 12 generated either by its element 1 or by its inverse 11; 
75 12 generated either by its element 7 or by its inverse 5 (the sequence of notes 
generated by the element 1 is the ascending chromatic scale, and the sequence 
generated by 11 is the descending chromatic scale; whereas the element 7 gen¬ 
erates the ascending circle of fifths and the element 5 generates the descending 
circle of fifths). The remaining representation of the chromatic scale is as the 
product group 77 % x 77 4 . 

Following Balzano [1], the group 7Z\2, as generated by its element 1, is called 
semi-tone group , and it reflects melodic relations among notes of the scale. If the 
group is seen as generated by its element 7, which corresponds to the diatonic 
interval of a perfect fifth, we call it group of fifths. 

On the other hand, the group 77 3 x 77 4 is called group of thirds , since 77 3 
represents the augmented chord, composed by two major thirds, and 77 4 repre¬ 
sents the diminished chord, composed by two minor thirds. So, we can say that 
this representation of the scale models its harmonic structure. 

Analogous phenomena can be described when these ideas are applied to the 
11-note diatonic scale immersed within the 20-note chromatic scale, constructed 
at the end of the preceding section. Note that the element of 75 2 o that corre¬ 
sponds to the generalised fifth , as described in Sect. 2 (which is the note 11), 
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is the same element that generates the group of fifths, as defined in this section. 
The group 7Z 4 x 7Z§, isomorphic to z^20 5 hints at the harmonic structure of the 
scale. 

Please note, from now on we sometimes use, either the number 12 or 20, as 
a subscript to indicate that we are making reference to the 12 -note chromatic 
scale, identified with ^ 12 , or to the 20 -note chromatic scale, identified with z^o- 
The following table identifies by letter each one of all 11 notes belonging to 
the diatonic scale immersed within the 20-note chromatic scale (Table 1). 


Table 1. Diatonic notes by letter in z^ 20 - 


Number 

0 

2 

4 

6 

8 

9 

11 

13 

15 

17 

19 

0 

Letter 

C 

D 

E 

F 

G 

H 

I 

J 

K 

A 

B 

C 


At the end of the previous section, the structure of this scale was described in 
terms of hexachords and also in terms of whole-tone and semi-tone distances. In 
the case of the usual diatonic scale, the major mode determines which intervals 
are called major: the ones measured from the tonic C 20 to each one of the notes 
of the diatonic scale. 

For the 11-note diatonic scale immersed within /Z 20 , we similarly identify 
the major intervals through the major mode, measuring distances from the note 
0. Table 2 summarizes the intervals: the letter M indicates major intervals; on 
the other hand, the letter P indicates perfect intervals, namely the unison and 
the twelfth 2 o, which is equivalent to an octave 2 o, plus the generalised fourth and 
the generalised fifth which, for this scale, are the diatonic sixth 2 o and seventh 2 o, 
respectively. Chromatic size, measured in semi-tones 2 o, is also indicated for each 
interval. 


Table 2. Diatonic major intervals in 2Z 20 - 


Interval 

IP 

2M 

3M 

4M 

5M 

6P 

7P 

8M 

9M 

10M 

11M 

12P 

Semi-tones 

0 

2 

4 

6 

8 

9 

11 

13 

15 

17 

19 

20 


Notice that, after defining the corresponding minor intervals , all of the notes 
belonging to ^20 will have been used, with exception of the tritone Kfeo, which 
may be called augmented sixth 2 o or diminished seventh^ 0 , similarly to the tritone 
612 in z^i 2 , which is called augmented fourths or diminished fifth 12 . 
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Finally, we describe three-note chords in /Z 20 built by fourths 2 o- Table 3 
summarises these chords; degree corresponds to the fundamental note of the 
chord, and an indication is provided on whether the chord is major, minor, or 
the diminished chord whose fundamental is the leading note. 


Table 3. Chords in the scale of C 20 Major in 


Seventh 

11 

13 

15 

17 

19 

0 

2 

4 

6 

8 

9 

Fourth 

6 

8 

9 

11 

13 

15 

17 

19 

0 

2 

4 

Fundamental 

0 

2 

4 

6 

8 

9 

11 

13 

15 

17 

19 

Degree 

I 

II 

Hi 

iv 

V 

VI 

VII 

VIII 

ix 

X 

xi Q 


A proposal for a notation with letters, for all 20 chromatic notes, is as follows 
(Table 4): 


Table 4. Chromatic notes by letter in ^ 20 - 


note 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

letter 

C 

c # 

D 

D # 

E 

E* 

F 

E* 

G 

H 

H # 

I 

1 * 

J 

j # 

K 

K # 

A 

A # 

B 



D b 


E b 


E b 


G b 



I 6 


r 


K b 


A b 


B b 



We conclude by enumerating all 20 keys in ^20 5 with their respective key- 
signatures, in Table 6. Just as in 7Z 12 , they should be ordered according to the 
generalised circle of fifths, which in this case is a circle of sevenths 20 - In Sect. 5 
we address the main contribution of this paper, namely how to modulate from 
one key to another in the microtonal scale ^ 20 - But before that, in the next 
section we review the theory of modulation in 2Z 12 . 

4 Symmetry and Modulation in 2Zx<i 

In [3] and [5], Guerino Mazzola developed a theory that describes modulation 
between two keys, or two scales that belong to the same translation class, that 
is, two scales that are equivalent under musical transposition. The modulations 
computed by Mazzola, for the traditional degrees of relation between keys, agree 
with classical theory, so the results can be interpreted as a generalisation of estab¬ 
lished music theory. The following definitions are adapted from Muzzulini [7]. 

Definition 1. A seven note scale s is any subset of 7Z 12 which has seven 
elements. 
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In Z 12 , we can choose an arbitrary note, say Th, as the tonic of scale 5, and 
consequently we enumerate the notes of scale s as: Th, D 2 , • • • , Dj. In order 
to consider chords and harmony, we sometimes refer to diatonic notes as degrees 
of scale s. For example, note D\ is called the first degree , note D 2 is called the 
second degree , etc. 

Definition 2. For n = 1 , ..., 7, define the chord or triad beginning on the n th 
degree of scale s as the set s n {D n , mod 7 ]+i, -^[(n+ 3 ) mod 7 ]+i} • 

That is, si = {D 1? D 3 , Z) 5 }, 52 = {D 2 , D 4 , D 6 }, ...,57 = {D 7 D 2 ,D^}. 
Also, we want to be able to refer to the set of all chords of scale s: 

Definition 3 . The covering {51,52,53,54,55,56,57} of scale s by its triads is 
called the triadic interpretation of s, and is denoted by s® . 

We want to be able to talk about transposing a scale 5 . Of course, this 
corresponds to a translation in z^i 2 . 

Definition 4 . Two scales r and s belong to the same translation class if there 
exists a translation defined in 2Z 12 that transforms r to s. 

We will also want to refer to cadences, or more precisely, to sets of chords 
that constitute a cadence, thus called cadential sets. 

Definition 5 . A subset (i of triads in s® is a cadential set of scale s if there 
does not exist any other scale r, in the same translation class as s, such that 
all the elements of (i are also triads of The cadential set /a is a minimal 
cadential set if no proper subset of fi is a cadential set. 

The importance of minimal cadential sets is that they allow to distinguish among 
scales that belong to the same translation class. 

Example 1. A major diatonic scale has the following minimal cadential sets: 
{zi,m}, {m, IV}, {IV, V), {it, V} and {viio}. The set {/, V} is not a cadential 
set in C Major because its chords also belong to the triadic interpretation of G 
Major. The set {/, /V, V} is a cadential set but it is not minimal (Muzzulini [7]). 

In this context, invertible affine transformations of Zi 2 onto itself are called 
symmetries, and are written as T n /o, where n is an element of ^ 12 , T n is a 
translation by n and Iq is the inversion (for example, the inverse of 3 is 9). An 
example of a translation is: 


T 3 : {0,4, 7} {0 + 3,4 + 3, 7 + 3} = {3, 7,10}, 

and an inversion: 

Jo : {0,4,7} {0,8,5}. 
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Then, an invertible affine transformation is: X 3 / 0 . Now we define symmetries of 
a scale s: 

Definition 6 . The internal symmetries of a scale s are the symmetries of 7Z 12 
that leave s invariant. 

The concept of internal symmetry of a scale can be extended to its triadic 
interpretation: 

Definition 7 . An internal symmetry of the triadic interpretation is an 
internal symmetry of s applied to the triads of s^ 3 \ 

It is shown in [4] that the only non trivial internal symmetries of a triadic 
interpretation are inversions. 

Now, in order to present Mazzola’s theory, we need to have a clear concept 
of musical modulation. In his 1911 work Theory of Harmony (Harmonielehre) 
[ 8 ], Arnold Schonberg defines modulation as a three step process, as highlighted 
by Mazzola. Schonberg’s three steps are: 

1. First, some neutral chords, those that are common to both keys, should 
appear; 

2. Then the harmonic pivot progressions that introduce the new key, should 
appear; 

3. Finally, the cadence confirms the new key. 

In analogy with particle physics, Mazzola interprets modulation as the result 
of the action of a symmetry based on a modulation quantum Q. Explicit con¬ 
struction of the modulation quantum allows for calculation of the pivots. A 
modulator is a transformation g = T m /, where / is an internal symmetry of 
Q is defined as a subset of Z12, and fi is defined as a minimal cadence set of 
the target scale r. The following properties result in the modulation algorithm 
as it appears in Muzzulini’s [7], which is inspired in Mazzola’s [4] and [5]. 

1. There exists modulator g for (s( 3 ), r^ 3 )), which is an internal symmetry of Q, 
that is, g(Q) = Q . 

2 . All triads of (i are subsets of Q. 

3. The only internal symmetry of r^ that is also an internal symmetry of r = 
r p| Q is the identity and r is covered by triads of A 3 \ 

4. Q is minimal with respect to properties ( 1 ) and ( 2 ). 

We will exemplify the modulation algorithm by using an exotic scale of seven 
notes in ^12 . 

Example 2. In this example we will use the scale classified as ^62 in [7]: 

{C, C*, D, E, F*,G*, A*} = {0,1,2,4,6,8,10}, 
with triadic interpretation: 


S(C) = {{0,2,6}, (1,4,8}, (2,6,10}, {4,8,0}, {6,10,1}, (8,0,2}, {10,1,4}}. 
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The translations of the scale and their triadic interpretations are easily calcu¬ 
lated, which then leads to identifying the minimal cadential sets: 

{0,2,6 } = {C,D,F#}-, 

{1,4,8 } = {C*,E,G*}-, 

{6,10,1 } = {F#,A*,C*}-, 

{8,0,2} = {G*,C,D}; 

{10,1,4} = {A#,C*,E}. 

All of them consist of just one chord; however, this situation arises from the 
special characteristics of this scale. It is not that way in the general case. 

We present an example of modulation from the key of C to the key of F. We 
find that, in this case, the modulator is g = X^X^/o), as mn = 5 and X 2 /o is an 
inner symmetry of the scale. Thus we apply it to the first minimal cadent ial set: 

X 5 (X 2 / 0 ({0, 2,6})) = X 5 X 2 ({0,10,6}) = X 5 ({2,0,8}) = {7, 5,1} = {1, 5, 7}. 

Following the algorithm, we make the union with g({ 1, 5, 7}), that is, 

{1, 5, 7} U X 5 (X 2 / 0 ({1, 5, 7})) = {1, 5, 7} U {6, 2,0} = {0,1, 2, 5,6, 7}. 

This the modulation quantum Q for the minimal cadential set {0,2,6}. The 
intersection of Q with scale F is: {1, 5,6, 7}. In this case there is only one pivot: 
{1, 5, 7}. Once again, the fact that there is only one pivot is a characteristic of 
this scale, but does not represent the general situation. In the following section 
the modulation algorithm is applied to a microtonal scale. 

5 Symmetry and Modulation in ^20 

In this section we present a version of Mazzola’s theory of modulation adapted to 
the microtonal 11-note diatonic scale immersed in z^ 2 o; such scale was described 
in previous sections. We denote this diatonic scale as s and select D\ = 0, as the 
tonic of the scale; the other notes are numbered in increasing order: D\, X> 2 , ..., 
D 11 . As already discussed, in this scale we will use chords constructed by fourths: 

Definition 8. For n = we define a chord or triad on the n th degree 

as the Set. S n {D n , -^[(n+ 2 ) rnodll]-\-li -^[(n+5) modll\-\-l} 

Other concepts are defined in a similar way to the previous section. The 
triadic interpretation of scale s is the set of all chords constructed with notes 
from s. After constructing and comparing the cadential sets, it can be seen that 
there are five minimal cadential sets, namely: {Hi, v }, {v, VI}, {Hi, VIII}, 
{VI, VIII} and {xi Q } . 

There are evident similarities with the usual major scale: minimal cadential 
sets are combinations of only four chords: two minor chords, two major chords, 
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and the diminished chord. One minimal cadential set is formed by two minor 
chords, another one is formed by two major chords, two are formed by one major 
chord and one minor chord and the other one contains only the diminished chord. 
We may call subdominant and dominant the two major chords that appear as 
elements of the minimal cadential sets. Their importance comes also, as will be 
seen, from their repeated appearance as pivots of the modulations. In the case 
of the degree VIII (dominant), its harmonic importance is also related to the 
fact that this chord contains the leading note Du = 19. 

The only non trivial symmetry of is Tg/o, which acts on triads as shown 
next (Table 5): 


Table 5 . Transformation of chords under the symmetry Tg/o in 2Z20- 


I 


X 

II 

<— > 

ix 

Hi 

:* 

VIII 

iv 


VII 

V 


VI 

xi 0 

*-> 

xi Q 


There is a clear similarity with the symmetry of the usual major scale: major 
chords are transformed into minor chords and vice versa, while the diminished 
chord is left invariant. 

In the case of the usual major scale, the internal symmetry of the scale 
transforms the C major chord into the A minor chord, its relative minor. For 
the 11-note diatonic scale s, immersed within ZZ 20 , the internal symmetry Tg/o 
transforms the major chord whose fundamental is the degree / into the minor 
chord whose fundamental is the degree x. This is the main reason why we pro¬ 
pose that the mode that begins on the tenth degree (the note 172o), should be 
called the relative minor of s. But, what justifies that the major mode of the 
scale should begin with the note O 20 ? The answer lies in the similarities that 
exist between this mode and the major mode of the usual diatonic scale. To 
be specific: the structure of the hexachords (generalised tetrachords) and the 
position of the semitones, which allow for the existence of the diminished chord 
whose fundamental note is the leading note. 

We have computed the modulation quanta for modulations from the scale 
s to its translations r = T p (s), for p = 1, 2 ,..., 19. Table 7 shows all quantised 
modulations. First of all, note that, contrary to the case of ZZyi, in ^20 not 
every translation of s admits a quantised modulation: there are no quantised 
modulations for values of p equal to 8 or 12. 

For each value of p and for each quantised modulation from s to r = T p (s), 
Table 7 shows the minimal cadential sets fi along with the modulator g. Then, 
for each minimal cadential set, if there is a quantum Q that fulfills the required 
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properties, it is shown on the same row, thus describing a quantised modulation. 
Notice that, in the notation used for the modulation quantum Q and its trace 
r = Q fir, the number 0 correspons to the tonic of the departure scale s. On the 
other hand, cadential sets /i and pivots are indicated as degrees of r, the target 
scale of the modulation. 

Table 7 gives complete information on how to modulate. First one should 
choose a target scale r = T p (s ) for which there exists at least one quantised 
modulation. Then, choose a cadential set, if there is more than one possibil¬ 
ity. The table provides the pivot chords that should be used for the transition 
progression, before confirming the new key with the cadence. 

It is interesting to note that for the values of p = 9 and p = 11, there is a 
quantised modulation for each of all five minimal cadential sets. This maximises 
the number of alternative cadences to modulate to these two scales. Notice that 
notes 9 and 11 correspond to the degrees VI and VII, respectively, and they 
are the tonics of the two keys which are closest to s , if distance is measured by 
the circle of sevenths, which corresponds to a generalised circle of fifths in TZ ^. 


Table 6. Key signatures for ZZ ^. 


Key 


s 

i 

g 

n 

a 

t 

u 

r 

e 

C=0 











I =11 

H# = 10 










D=2 

H# = 10 

C*=l 









J =13 

H#=10 

C*=l 

I#=12 








E =4 

H#=10 

C*=l 

l#=12 

D#=3 







K=15 

H#=10 

C*=l 

1*=12 

D # =3 

J#=14 






F =6 

H# = 10 

C*=l 

1#=12 

D#=3 

J# = 14 

E#=5 





A=17 

H# = 10 

C#=l 

1#=12 

D#=3 

J# = 14 

E#=5 

K#=16 




G=8 

H# = 10 

C*=l 

1*=12 

D#=3 

J# = 14 

E#=5 

K#=16 

F*=7 



B=19 

H#=10 

C*=l 

l#=12 

D#=3 

J#=14 

E*=5 

K#=16 

F*=7 

A#=18 


H=10 

H#=10 

C*=l 

l#=12 

D#=3 

J#=14 

E*=5 

K#=16 

F*=7 

A#=18 

G#=9 

D 6 =l 

I 6 =10 

D b =l 

J b =12 

E 6 =3 

K 6 =14 

F b =5 

A b =16 

G b =7 

B b =18 


J 6 =12 


D b =l 

J b =12 

E 6 =3 

K 6 =14 

F b =5 

A t> =16 

G b =7 

B b =18 


E i> =3 



J b =12 

E 6 =3 

K 6 =14 

F 6 =5 

A t> =16 

G b =7 

B b =18 


K b =14 




E 6 =3 

K 6 =14 

F b =5 

A b =16 

G b =7 

B b =18 


F 6 =5 





K 6 =14 

F b =5 

A b =16 

G b =7 

B b =18 


A b =16 






F b =5 

A b =16 

G b =7 

B b =18 


G b =7 







A b =16 

G b =7 

B b =18 


B 6 =18 








G b =7 

B b =18 


H =9 









B b =18 
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Table 7 . Quantised modulations in ^ 20 - 


p 


Q 

g 

Pivots 

r = Q n r 

1 

(iiijv) 

(0,4,5,9,10,13,14,15,16,19) 

t 9 i 0 

(iii,v,VIII,xi G ) 

(5,9,10,14,16,0) 

2 

(iii, V ) 

(0,1,4,6,9,10,11,13,15,17,19) 

T 10 lo 

(II,iii,v,VII,VIII,x,xi 0 ) 

(4,6,10,11,13,15,17,19,1) 


(vjVI) 

(0,1,2,8,9,10,11,13,15,17,19) 

T 10 lo 

(I,iv,v,VI,ix) 

(2,8,10,11,13,15,17,19,1) 


(iii,VIII) 

(1,4,6,9,11,13,15,17,19) 

T 10 lo 

(iii, vii,viii, xi 0 ) 

(4,6,11,13,15,17,19,1) 


(VI,VIII) 

(1,2,4,6,8,9,11,13,15,17,19) 

TlO lo 

(I,iii,iv, VI, VII, VIII,ix,xi 0 ) 

(2,4,6,8,11,13,15,17,19,1) 

3 

(VI,VIII) 

(2,3,4,7,8,9,12,13,15,16,18,19) 

Tulo 

(iii,VI,VIII,ix,xio) 

(3,7,9,12,16,18,2) 

4 

( xi o) 

(3,4,8,9,13,19) 

T 12 I 0 

(iii,VI,xi G ) 

(4,8,13,19,3) 

5 

(iii, v) 

(0,4,9,13,14,15,18,19) 

T 13 I 0 

(iii,v,VIII,xi 0 ) 

(9,13,14,18,0,4) 


(iii,VIII) 

(0,4,9,13,14,15,18,19) 

T 13!o 

(iii,v,VIII,xi 0 ) 

(9,13,14,18,0,4) 

6 

(iii,v) 

(0,1,4,5,9,10,13,14,15,19) 

T 14 I 0 

(iii,v,VIII,xi 0 ) 

(10,14,15,19,1,5) 


(v,VI) 

(0,1,5,6,8,9,13,14,15,19) 

T 14 I 0 

(H,v,VI) 

(6,8,14,15,19,1,5) 


(xio) 

(4,5,9,10,15,19) 

T 14 I 0 

(VIII,xi 0 ) 

(10,15,19,5) 

7 

(VI,VIII) 

(0,2,4,6,7,8,9,11,13,15,16,19) 

T 15 I 0 

(II,iii,v,VI,VIII,ix,x,xio) 

(7,9,11,13,15,16,0,2,4,6) 

8 

- 

- 

T 16!o 

- 

- 

9 

(iii, v ) 

(0,2,4,8,9,13,15,17,18,19) 

T 17 I 0 

(I,iii,v,VI,VIII,ix,xi 0 ) 

(9,13,15,17,18,0,2,4,8) 


(v,VI) 

(0,2,4,8,9,13,15,17,18,19) 

T 17 I 0 

(I,iii,v,VI,VIII,ix,xi 0 ) 

(9,13,15,17,18,0,2,4,8) 


(iii,VIII) 

(2,4,8,9,13,15,18,19) 

T 17!o 

(iii,VI,VIII,ix,xi 0 ) 

(9,13,15,18,2,4,8) 


(VI,VIII) 

(2,4,8,9,13,15,18,19) 

T 17!o 

(iii,VI,VIII,ix,xi 0 ) 

(9,13,15,18,2,4,8) 


(xio) 

(4,8,9,13,18,19) 

T 17!o 

(iii,VI,xi 0 ) 

(9,13,18,4,8) 

10 

(iii, v ) 

(0,3,4,5,9,13,14,15,18,19) 

T 18lo 

(iii,v,VIII,xi 0 ) 

(14,18,19,3,5,9) 


(VI,VIII) 

(3,4,5,8,9,10,13,14,15,19) 

T 18lo 

(iii,VI,VIII,xi 0 ) 

(10,14,19,3,5,9) 


(iii,v) 

(3,4,5,8,9,13,14,15,18,19) 

T 10 

(iii,v,VIII,xi 0 ) 

(14,18,19,3,5,9) 


(VI,VIII) 

(0,3,4,5,9,10,13,14,15,19) 

T 10 

(iii,VI,VIII,xi 0 ) 

(10,14,19,3,5,9) 

11 

(iii, v) 

(0,4,6,9,10,13,15,19) 

T 19lo 

(II,iii,v,VIII,xi 0 ) 

(13,15,19,0,4,6,10) 


(vjVI) 

(0,4,6,8,9,10,11,13,15,19) 

T 19lo 

(II,iii,v,VI,VIII,x,xi 0 ) 

(11,13,15,19,0,4,6,8,10) 


(iii,VIII) 

(0,4,6,9,10,13,15,19) 

T 19lo 

(II,iii,V,VIII,xio) 

(13,15,19,0,4,6,10) 


(VI,VIII) 

(0,4,6,8,9,10,11,13,15,19) 

Tl9l0 

(II,iii,v,VI,VIII,X,xio) 

(11,13,15,19,0,4,6,8,10) 


(xio) 

(0,4,9,10,15,19) 

Tl9l0 

(v,VIII,xio) 

(15,19,0,4,10) 

12 

- 

- 

T 0 I 0 

- 

- 

13 

(iii,v) 

(0,1,2,4,6,8,9,12,13,15,17,19) 

Til 0 

(I,II,iii,V,VI,VIII,ix,xio) 

(13,15,17,19,1,2,4,6,8,12) 

14 

(v,VI) 

(0,2,3,7,8,9,13,14,15,19) 

t 2 i 0 

(v,VI,ix) 

(14,0,2,3,7,9,13) 


(VI,VIII) 

(3,4,7,8,9,13,14,15,18,19) 

t 2 i 0 

(iii,VI,VIII,xi 0 ) 

(14,18,3,7,9,13) 


(xio) 

(3,4,9,13,18,19) 

t 2 i 0 

(iii,xi G ) 

(18,3,9,13) 

15 

(iii,VIII) 

(4,8,9,10,13,14,15,19) 

T 3 I 0 

(iii,VI,VIII,xi 0 ) 

(15,19,4,8,10,14) 


(VI,VIII) 

(4,8,9,10,13,14,15,19) 

T 3 I 0 

(iii,VI,VIII,xi 0 ) 

(15,19,4,8,10,14) 

16 

(xio) 

(0,4,5,9,15,19) 

T 4 I 0 

(v,VIII,xi 0 ) 

(0,4,5,9,15) 

17 

(iii,v) 

(0,1,4,5,6,9,10,12,13,15,16,19) 

T 5 I 0 

(II,iii,v,VIII,xi 0 ) 

(19,1,5,6,10,12,16) 

18 

(iii, v ) 

(0,2,4,6,7,9,11,13,15,17,19) 

t 6 i 0 

(II,iii,iv,v,VII, VIII,x,xi 0 ) 

(0,2,4,6,7,9,11,13,15,17) 


(v,VI) 

(0,6,7,8,9,11,13,15,17,18,19) 

t 6 i 0 

(II,v,VI,VII,x) 

(18,0,6,7,9,11,13,15,17) 


(iii,VIII) 

(2,4,7,9,11,13,15,17,19) 

t 6 i 0 

(iii,iv,VIII,xi 0 ) 

(2,4,7,9,11,13,15,17) 


(VI,VIII) 

(2,4,7,8,9,11,13,15,17,18,19) 

t 6 i 0 

(I,iii,iv,VI,VIII,ix,xi 0 ) 

(18,2,4,7,9,11,13,15,17) 

19 

(iii,VIII) 

(3,4,8,9,12,13,14,15,18,19) 

T 7 I 0 

(iii,VI,VIII,xi 0 ) 

(19,3,8,12,14,18) 


(VI,VIII) 

(3,4,8,9,12,13,14,15,18,19) 

T 7 I 0 

(iii,VI,VIII,xi 0 ) 

(19,3,8,12,14,18) 
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Abstract. The combinatorial theory of difference sets has prior applica¬ 
tions in the field of mathematical music theory. The theory of almost dif¬ 
ference sets, however, has not received similar attention from music schol¬ 
ars. Nevertheless, these types of structures also have significant musi¬ 
cal applications. For instance, the well known all-interval tetrachords of 
pitch-class set theory are almost difference sets. To that end, we inves¬ 
tigate the various categories of almost difference sets (cyclic, abelian, 
and non-abelian) in terms of their representations in Lewinian music- 
transformational groups. 


Keywords: Difference set • Almost difference set 
Flat interval distribution • All-interval chord 
Generalized Interval System 


1 Introduction 

The combinatorial theory of difference sets has prior applications in the field of 
mathematical music theory. For instance, Gamer and Wilson [1] relate various 
n-chords in microtonal systems to difference sets. Wild [2] generalizes this idea 
further to flat-interval distributions. The theory of almost difference sets, how¬ 
ever, has not received similar attention from music scholars. Nevertheless, these 
types of structures also have significant musical applications. For instance, the 
well known all-interval tetrachords of pitch-class set theory [3] are almost differ¬ 
ence sets. To that end, we investigate the various categories of almost difference 
sets (cyclic, non-cyclic abelian, and non-abelian) in terms of their representations 
in Lewinian music-transformational groups [4]. 

2 Mathematical Preliminaries 

2.1 Difference Sets 

Before proceeding to the concept of almost difference sets, let us first establish 
what is a difference set. 
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Definition 1. Let G be a finite, additively notated group of order v, and let D 
be a k-member subset of G. D is a (v,k,X) difference set (DS) in G if every 
non-identity element of G can be written as a difference di — d 2 of elements 
di,d 2 ,£ D in exactly X ways. (In multiplicatively notated groups, we use the 
product d\df} as an analog for the difference d\ — d^-) We call n = k — X the 
order of D. 

Example 1. Put G = Z/7Z, observing that G is a group of order v = 7. Let 
D = {0,1,3} be a subset of G of cardinality k = 3. We note that every non¬ 
identity element of G can be expressed in exactly A = 1 way as a difference of 
elements ^ 1,^2 £ D. 


1 — 0 = 1 (modulo 7) 

3-1 = 2 
3-0 = 3 
0-3 = 4 
1-3 = 5 
1-0 = 6 

Hence, D is a (7, 3,1) DS of order n = 2. 

Below are some properties of DSs that have relevance to our later results. 

Definition 2. A DS is cyclic, abelian, or non-abelian, in accordance with 
the property of the particular group G that contains it as a subset. 

We thereby make a distinction between cyclic DSs and other, non-cyclic abelian 
DSs by referring to the latter merely as abelian , following the convention in 
combinatorics [5]. Further, the following property is of special significance to 
our musical applications below. 

Definition 3. If A = 1, we call D a planar difference set (PDS). 

For instance, D in Example 1 above is a PDS, as each non-identity element of 
G occurs only once as a difference of elements in D. This particular D is also 
cyclic, because Z/7Z is a cyclic group. 

Theorem 1. A = 1 ==> v = n 2 + n + 1. 

Proof. Let G be a group of order v, and let G # be the set of non-identity elements 
of G. We note that, for a DS D of size k in G, there exist exactly kP2 = 2(^) 
possible differences g — h, where g,h £ D and g h. If A = 1, then each non¬ 
identity element of G appears once and only once as a difference of elements in 
D ; therefore, G# is also of size kP2. As n = k — 1 (using n = k — X and A = 1) 
and kP2 = k(k — 1), we observe that k(k — 1) = (k — l) 2 + (k — 1) = n 2 + n. 
Adding back the identity element gives the order of G as v = n 2 + n + 1. □ 
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Theorem 2. If D is a PDS, then G cannot contain an involution. 

Proof. For the condition A = 1 to hold, there must exist some d\,d 2 G D for 
every non-identity element g G G such that d\df} = g , but for which 


i 1 + g- 

If, however, g = g -1 , then d^d p = (<fi<ip) _1 = g~ x = g , contradicting the 
above statement. □ 

DSs may relate to one another in various structural ways. 

Definition 4. Let D\ be a DS in a group Gi, and let be a DS in a group 
G 2 . Then, we say that D\ is isomorphic to D 2 if they share the same (v, k, A) 
parameters. Further, we say that D\ is equivalent to D 2 if they share the same 
(v,k,X) parameters and if G\ is related to G 2 by a group isomorphism. 1 

It is possible that two non-equivalent DSs Di and D 2 can be isomorphic to one 
another while also not sharing the same property from Definition 2. For example, 
one finds a (21,5,1) DS in Z/21Z and another (21,5,1) DS in Z/7Z x Z/3Z. 
Whereas these DSs have different properties—one is cyclic and the other is non- 
abelian—they are isomorphic because they share the same (v,k,X) parameters. 

Definition 5. Let Di and D 2 be DSs in an additive group G. Then, D 2 is 
a G-translate of Di if D 2 = {d + g \ d G Di} for some g G G. (We use 
D 2 = {dg | d G Di} in multiplicative groups.) 

Any two DSs D\ and D 2 that are G-translates of one another possess the same 
(v, k , A) parameters. Therefore, as G is trivially isomorphic (as a group) to itseslf, 
D\ and D 2 are equivalent (as DSs). 


2.2 Almost Difference Sets 

An almost difference set is a related combinatorial structure. 

Definition 6. An almost difference set (ADS) is a subset D = (v,k,X,t) 
of G, where v and k are the same as in Definition 1; t non-identity elements 
of G appear exactly A times as differences g — h of elements g,h G D; and the 
remaining v — l — t non-identity elements of G appear A + 1 times as differences. 
(Again, in multiplicatively notated groups, we use the product gh~ x as an analog 
for the difference g — h.) Similar to DSs, we give the order of D as n = k — A. 

1 In the field of combintorics, the use of the term isomorphism in connection with 
difference sets derives from the use of the same term in the theory of balanced 
incomplete block designs (BIBDs), as a (v,k, A) difference set D is equivalent to a 
symmetric (v, k, X) -design [5, Theorem 18.6]. Two BIBDs are considered isomorphic 
if there exists a bijection from one design to another such that, if we rename every 
point in one design with its image in the other, the collection of blocks in the first 
is transformed into that of the second. 
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In essence, an ADS is a generalization of a DS, wherein the latter is an ADS 
with t = 0 or t = v — 1 [6]. We call ADSs with 0 < t < v — 1 proper ADSs, to 
differentiate them from ADSs that are also DSs (improper ADSs). 

Example 2. Put G = Z/6Z; G is of order v = 6. Let D = {0,1,3} be a subset 
of G of cardinality k = 3. We note that t — 4 non-identity elements of G can be 
expressed in exactly A = 1 way as differences of elements di,c ?2 G D, whereas 
the remaining v — 1 — t = 1 non-identity element of G can be expressed as a 
difference of elements d\, G D in exactly A + 1 = 2 ways. 

1 — 0 = 1 (modulo 6) 

3-1 = 2 
3 — 0 = 0 — 3 = 3 
1-3 = 4 
0-1 = 5 


Therefore, D is a (6, 3,1,4) ADS. 

Definition 7. As with DSs, an ADS D can be cyclic, abelian, or non- 
abelian. A G-translate — Dg, for some g G G —of a (v,k,X,t) ADS is also 
a (v,k,X,t) ADS. Further, ADSs can be related in various ways: they may be 
isomorphic or equivalent, etc., as described above for DSs. 

Definition 8. We call an ADS with A = 1 a planar almost difference set 
(PADS). 

Recalling that ADSs are generalizations of DSs, we offer the following result 
concerning PADSs as a corollary to Theorem 1. 

Corollary 1. A = 1 => v = n 2 + n + l- 71,2+ 2 n ~* . 

However, an important distinction that relates to our later results exists between 
PDSs and PADSs. Whereas Theorem 2 states that a PDS cannot contain an 
involution, PADSs may contain involutions, as we see in Example 2 above. 

Much of the research on ADSs deals with questions of existence or with 
applications in various branches of engineering, including cryptography, coding 
theory, and CDMA communications. Whereas many open questions remain, par¬ 
ticularly regarding existence, the following are considered basic properties [7]. 
The first calculates the number of differences in a ADS in two ways. 

Theorem 3. Let D be an ADS of size k in a group G of order v. Then, we 
observe the following: 


k{k — 1) — A t T (A T l)( / y — 1 — £). 

The second considers the complement of an abelian ADS. 

Theorem 4. If D is (v,k, A) ADS in an abelian group G with \D\ < then 
D' = G \ D is a (v, v — k, v — 2k + A, t) ADS. 
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3 Musical Preliminaries 

3.1 Generalized Interval Systems 

The music-theoretical context we employ in this work is that presented in David 
Lewin’s Generalized Musical Intervals and Transformations. Following Lewin [4, 
p. 26], we define a Generalized Interval System as follows. 

Definition 9. A Generalized Interval System (GIS) is an ordered triple 
(S,G,int), where the set S is the space of the GIS; G = (G, *) is the group of 
intervals; and int : S x S —> G is the interval function, satisfying the following 
two conditions: 

(a) For all r,s,t G S, int(r , s) * int(s , t) = int(r , t), 

(b) For all s E S and g E G, there exists a unique t G S which lies the interval g 
from s. In other words, there exists a unique t E S such that int(s,t ) = g. 

In essence, a GIS consists of the simply transitive (regular) action of a group G 
on a set S, wherein an interval is an element of G. 

Example 3. Let S be the space of twelve chromatic pitch classes as represented 
by the integers modulo 12, and let G = Z/12Z be a group of intervals with an 
action on S. Further, let int be the interval function that maps (s,t) E S x S 
to g = t — s (modulo 12) in G. The reader can easily verify that this example 
satisfies conditions (a) and (b) of Definition 9. 


3.2 Intervals and Differences 

Definition 10. An interval is an element g E G. If g is the unique element 
of G for which g : s t for some s,t E S (by Definition 9) , then we call 
(s,t)eSxS an occurrence of g. 

A significant relationship exists between the concepts of intervals and differ¬ 
ences. To facilitate their comparison, we consider the GIS formed by the simply 
transitive action of a group G on itself under addition (in the abelian case), or 
under (right) multiplication (in the non-abelian case). In such a context, G acts 
as both interval group and space. For any g , h in an abelian group G, i = h — g 
is the interval that carries g to h. Here, i is construed in the same manner as the 
difference d = h — g. In the non-abelian case, however, we reckon the interval 
from g to h as i = g~ x h. This notation differs from the multiplicative analog 
for a difference that we use above in Definition 1: i.e., d = hg~ x . Therefore, we 
require an additional layer of structure in relating non-abelian intervals with 
differences. 

Definition 11 . Let (G, *) be a group. Then, ( G opp , *') is the opposite group 
for G if G opp , as a set, is the same as G, and if g *' h = h * g for all g,h E G. 

Theorem 5. Put <f : G —> G, where (j)(g) = g~ x for all g G G. Then, <f f : G —> 
G opp , where <p'(g) = is an isomorphism. 
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Proof. First, we show that (f is an anti-automorphism of G. As (j>(g) * cf)(h ) = 
g - 1 * ft, -1 , and (j)(h * g) = g -1 * ft -1 we note that <f(g ) * <f>(h) = cf)(h * g) for all 
g,h G G. Hence, </> is an anti-homomorphism. Then, by the group axiom that 
stipulates the existence of a unique inverse for every g E G, 0 is a bijection. 
Therefore, <p is an anti-automorphism. 

Next, as 

) *' = 0 (g) *' 0 (h) (by 0 '(g) = 0 (g)) 

= <t>(h) * 0(g) (by g *' h = h * g) 

= <f(g * ft) (by def. of anti-automorphism) 

= 0'( g *h), (by 0'(g) = 0(g)) 

we observe that <f> f is an isomorphism from G to G opp . □ 

Thus, the set of intervals in G and the set of differences in G are images of 
one another under . Because of this structural identity, we henceforth speak 
informally of intervals and differences as being equivalent, without invoking the 
isomorphism. 

Example 4- Let the set S = {To, X 3 , Tq, Tg, ii, I 4 , I 7 , iio} of pitch-class opera¬ 
tions be the space for our GIS. We note that S has a simply transitive action on 
itself under right multiplication, isomorphic to the non-abelian dihedral group 
of order 8 ; this action constitutes the interval group G for the GIS. Put g = X 3 
and h = I 4 . Then, i = g~ x h = T 3 -1 /4 = I 7 is the unique element of G that 
satisfies the equation g * i = h. That is, i = I 7 is the interval from g to h. (We 
note that i 7 ^ d, where d = hg~ x = / 4 T 3 -1 = I\ is the multiplicative analog to 
the difference h — g.) 

3.3 Total Intervallic Content and Interval Vector 

We are interested in the total intervallic content of a subset D C S in a GIS: 
a tally of the number of occurrences of each interval in G that obtains between 
the members of D. For that purpose, we utilize the interval vector of D. 

Definition 12. Let G be an interval group of order v with a regular action on 
a set S, and let DCS. The interval vector of D, IV(D ) = ( 11 , 12 , —i v ), 
is a v-member array that tallies the number of occurrences in D x D of each 
directed interval in G. In interval groups for which the constituent intervals are 
measurable (such as in a vector space), it is customary to list the coordinates 
of the interval vector in order of increasing intervallic size, beginning with the 
unison (identity) interval (see [ 8 ]). 

Example 5. Let us use the same GIS as in Example 3, which incorporates the 
simply transitive action of the interval group G = Z/ 12 Z on the set of twelve 
pitch-class integers S = {0,1, 2 , ... 11 }. The subset D = {0,1, 2 ,3,4, 6 , 8 , 9} of S 
has the following total interval content. 
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0 — 0 = 1 — 1 = 2 — 2 


6-0 


3 — 3 = 4 — 4 = 6 — 6 = 

1 — 0 = 2 — 1 = 3 — 2 = 

2 — 0 = 3 — 1 = 4 — 2 = 

3 — 0 = 4 — 1 = 6 — 3 = 

4 — 0 = 6 — 2 = 8 — 4 = 

6-1 = 8 — 3 = 9 — 4 = 
8 — 2 = 9 — 3 = 0 — 6 = 

1 — 6 = 3 — 8 = 4 — 9 = 

0 — 4 = 2 — 6 = 4 — 8 = 

0 — 3 = 1 — 4 = 3 — 6 = 

0 — 2 = 1 — 3 = 2 — 4 = 

0 — 1 = 1 — 2 = 2 — 3 = 


8 — 8 = 9 — 9 = 0 
4 — 3 = 9 — 8 = 1 
6 — 4 = 8 — 6 = 2 

9 — 6 = 0 — 9 = 3 
0-8=1-9=4 

1 — 8 = 2 — 9 = 5 

2 — 8 = 3 — 9 = 6 
8 — 1 = 9 — 2 = 7 
8 — 0 = 9 — 1 = 8 
6 — 9 = 9 — 0 = 9 
4-6 = 6-8 = 10 

3 — 4 = 8 — 9 = 11 


We note that D has 8 occurrences of the identity interval 0 , 2 6 occurrences of 
the interval 6 , and 5 occurrences of each of the remaining intervals in G. Hence, 

IV (D) = ( 8 , 5, 5,5, 5, 5, 6 , 5, 5, 5, 5, 5), 


where the vector’s first coordinate represents the number of occurrences of inter¬ 
val size 0 ; the second, interval size 1 ; the third, 2 ; and so on. 


4 PADSs in GISs 

4.1 FLIDs and NFLIDs 

Sets of musical objects that have flat interval distributions, or FLIDs, are DSs, 
as the number of occurrences of every non-identity interval in a FLID’s interval 
vector is equal to some integer A. FLIDs are of considerable interest in mathe¬ 
matical music theory; for instance, they contribute to results that incorporate 
the Discrete Fourier Transform [9, Sect. 4.3.3]. 

Similarly, sets that are ADSs have near-flat interval distributions (NFLIDs). 
The interval vector for such a set displays t coordinates that are equal to A, and 
v — 1 — t coordinates that equal A + 1. For instance, the set D in Example 5 is a 
( 12 , 8 , 5, 10 ) ADS: its interval vector, IV ( D ) = ( 8 , 5, 5, 5, 5, 5, 6 , 5, 5, 5, 5, 5), has 
t = 10 coordinates that are equal to A = 5, and v — 1 — t = 1 coordinate that 
equals A + 1 = 6 . Like FLIDs, NFLIDs are also of music-theoretical and compo¬ 
sitional interest. The classical example is the set of all-interval tetrachords from 
pitch-class set theory, the members of which appear frequently in the post-tonal 
repertoire and its theories. These 48 sets—members of set classes [0,1,4, 6 ] 12 and 
[ 0 , 1 , 3 ,7] 12 —share the interval vector (4, 1 , 1 , 1 , 1 , 1 , 2 , 1 , 1 , 1 , 1 , 1 ). Accordingly, 
they are (12,4, 1 , 10 ) ADSs. 

2 A set D of size k will always display k occurrences of the identity interval. Typically, 
this is shown as the first coordinate in an interval vector. 
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One special property of the members of set classes [0,1,4, 6] 12 and [0,1, 3, 7] 12 
is that they are the smallest pitch-class sets (in this case, the only pcsets with 
k = 4) to contain at least one of every interval in their interval group G = Z/12Z, 
hence, their name “all-interval tetrachords.” As such, they demonstrate remark¬ 
able efficiency in their compactness and intervallic completeness. This attribute 
is common to FLIDs and NFLIDs with A = 1. To that end, we survey the PADSs 
of small order. Table 1 presents a list of the 3,800 proper and improper (i.e., those 
with t = v — 1) PADSs of order 2 < n < 6. These sets, which appear in fourteen 
isomorphism classes of groups, include cyclic, abelian, and non-abelian PADSs. 
In the following subsections, we consider the equivalence classes of these PADSs 
in order of ascending values for n. The group-theoretical data was collected pri¬ 
marily using computer testing in GAP [10] (see the Appendix for sample code). 


Table 1 . PADSs of small order (2 < n < 6 ) 


n 

G 

(v, k , A, t) 

M 

2 

JjQ 

( 6 ,3,1,4) 

12 

2 

Z7 

(7,3,1,6) 

14 

3 

D s 

( 8 ,4, 1 , 2 ) 

16 

3 

Zl2 

(12,4,1,10) 

48 

3 

Z 13 

(13,4,1,12) 

52 

4 

sd 16 

(16,5,1,10) 

128 

4 

S3 x Z 3 

(18,5,1,14) 

108 

4 

Z7 X Z3 

(21,5,1,20) 

294 

4 

Z 21 

(21,5,1,20) 

42 

5 

A 4 X Z 2 

(24,6,1,16) 

192 

5 

Z| X Z 3 

(24,6,1,16) 

1344 

5 

Z 14 X Z 2 

(28,6,1,24) 

728 

5 

Z 31 

(31,6,1,30) 

310 

6 

Dg 0 Qg 

(32,7,1,20) 

512 


In all but one of the groups G below, the complete set of PADSs consists of 
the orbit of any one PADS D C S under the action of the normalizer of G in the 
symmetric group on S', Ns yrn (S)G. In the cyclic case, this normalizer is also the 
affine group. In the non-cyclic abelian and non-abelian cases, the normalizer acts 
as an analog for the affine group. The one exceptional case occurs in Subsect. 4.5, 
in the discussion of n = 5 PADSs. 


4.2 PADSs with n = 2 

Two equivalence classes of PADSs with n = 2 exist: those in Z/(n 2 + n )Z and 
in Z/(n 2 + n + 1)Z. As both these groups are cyclic, so too are their constituent 
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PADSs. The 12 PADSs in Z/ 6 Z have t = 4; hence, they are proper PADSs. In 
contrast, the 14 PADSs in Z/7Z are improper, as t = v — 1 = 6. 3 4 In either group, 
the set 3 of all PADSs consists of the orbit of some D under the action of the 
relevant affine group, which is isomorphic in both cases to the dihedral group 
D 2v . In music-theoretical parlance, the members of 3 belong to two transposition 
classes of size v, related to one another by inversion. 

A musical representation of PADSs in the former case (y = 6 ) can be found 
in the following GIS. Let S = { 0 , 2 ,4, 6 , 8 , 10 } be the six degrees of a whole-tone 
scale, on which the additive group G = 2 Z/ 12 Z has a simply transitive action. 
Then, the set D = {0,2,6} (and its various transpositions and inversions) is a 
(6,3,1,4) PADS, as demonstrated below. 

2 — 0 = 2 (modulo 6 ) 

6-2 = 4 
6 — 0 = 0 — 6 = 6 
2-6 = 8 
0 - 2 = 10 

The case of v m 7 has a similar musical representation in the GIS of seven 
diatonic scale degrees, acted on by the cyclic group of diatonic transposition 
operators. 

4.3 PADSs with n = 3 

The equivalance classes of PADSs with n = 3 include two classes with similar 
structure to those with n = 2 above. We find cyclic PADSs in both Z/(n 2 + 
n )Z and Z/(n 2 + n + 1 )Z. Now, however, the orbits under the respective affine 
groups are larger, as are the affine groups themselves: Z/12Z contains 4v = 48 
( 12 ,4, 1 ,10) proper PADSs—the all-interval tetrachords of traditional pitch-class 
set theory—and Z/13Z contains Av = 52 (13,4,1,12) improper PADSs, the 
transposition-and-inversion classes of {0,1,4, 6 }i 3 and {0, l, 3 , 9 }i 3 . 

The case of n = 3 also includes the smallest non-abelian PADSs. There 
exist 16 (8,4, 1 , 2 ) PADSs in Dg. A musical representation occurs in the simply 
transitive action of the dihedral group G = {To, T 3 , Tq^ Xg, / 1 ,1 ^ Ij, / 10 } of pitch- 
class operations on the octatonic pitch-class set S = {0,1, 3,4, 6 , 7,9,10}. Then, 
the set D = { 0 ,1,4, 6 } is an example of a PADS for this GIS. 

4.4 PADSs with n = 4 

Whereas we find cyclic PADSs in Z/(n 2 + n + 1 )Z for n = 4, with n > 4, we cease 
to find cyclic PADSs in Z/(n 2 +n)Z. We conjecture that the reason is related to 
the non-existence of perfect Golomb rulers with five or more marks. In addition 

3 The fourteen PADSs in Z/7Z meet the stricter definition of being PDSs. 

4 A Golomb ruler with k marks and length L has a different measurement between 
any two marks. A perfect Golomb ruler is one in which every distance from 1 to L 
appears once as such a difference [11]. 
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to the 42 cyclic (21,5,1,20) PADSs in Z/21Z, we also find 294 non-abelian 
(21,5,1,20) PADSs in Z/7Z x Z/3Z. 5 As the PADSs in these sets share the 
same (v,fc, A,t) parameters, they possess the smallest order for non-equivalent, 
yet isomorphic PADSs. 

We find an example of a GIS that includes these non-abelian PADSs in 
the set of unordered dyadic subsets of a diatonic collection, as represented by 
S = ( z/ / z ). Let G = ( x,y \ x 7 = y 3 = 1; yx = x 2 y ) = 7LI17L » Z/3Z be a 
non-abelian group with a regular action on S', where, for all {s, t} E S, 

{s, t} • x = {s + 1, t + 1} (modulo 7) 


and 

{s, t] • y = {2s, 2 1} (modulo 7). 

Then, the set D = { {0, 2 }, {0, 5}, {1, 5}, {4, 5}, {4, 6 } } is a ( 21 , 5,1, 20) PADS. 

The case with n = 4 includes also two additional equivalence classes of 
non-abelian PADSs. The semidihedral group of order 16, SD i 6 , contains 128 
(16, 5,1,10) PADSs, and the direct product S 3 x Z/3Z contains 108 (18, 5,1,12) 

PADSs. 

4.5 PADSs with n = 5 

There exist four equivalence classes of PADSs with n = 5: three that include 
proper PADSs, and one that includes improper PADSs. The one class of improper 
sets occurs in Z/31Z, where we find 310 cyclic (31,6,1,30) PADSs. A musical 
example of such set can be found in the integer representation of 31-tone equal 
temperament [13], in which the set D = {1, 2,4, 9,13,19}3i has a FLID with 
Xml. The three classes of proper PADSs include two that are abelian and 
one that is non-abelian. The non-abelian class, which includes 192 (24,6,1,16) 
PADSs, is found in A 4 x Z 2 . A class of 1344 abelian (24, 6,1,16) PADSs appears 
in z| x Z 3 . With n = 5, this group possesses the smallest order of non-cyclic 
abelian PADSs. These two classes share the same (v, fc, A, t) parameters; hence, 
their PADSs are isomorphic. 

The third equivalence class of proper PADSs with n = 5 consists of 728 
(28, 6 ,1, 24) PADS in the abelian group Z 14 x Z 2 . It is the one exceptional class 
to which we allude in Subsect. 4.1, wherein the set Q) does not consist of the 
orbit of a single D under the action of N Sy m(S)G . Rather, 3) is the union of 
three orbits: two orbits of size 336, and one orbit of size 56. As such, these three 
classes are GISZ-related [14]. 

A musical representation of these PADSs exists in the following GIS. Let S 
consist of the set of all dyadic seconds, thirds, sixths, and sevenths in a C-major 
diatonic collection; we note that S is of size 28. Let G be a group isomorphic 
to Z 14 x Z 2 , generated by the following operations g , h on S. First, g is a cycle 
of length 14, consisting of two chains of suspensions: one of 7-6 suspensions, 

5 A standard result in the theory of DSs states that for every cyclic PDS with k = 2 
(modulo 3)—in this case, k m 5—there exists an isomorphic non-abelian PDS [12]. 
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and one of 2-3 suspensions. We model this cycle in Z/28Z, putting the diatonic 
third (C,E) = 0, the second (C,D) = 1, the third (B,D) = 2 , etc., through the 
second (D,E) = 13; and the same for sixths and sevenths, beginning with the 
sixth (E, C) = 14, and running through the seventh (E, D) = 27. Hence, g = 
( 0 , 1 , 2 , ...13)(14,15,16, ...27). Second, h exchanges every dyadic second with the 
seventh that contains the same scale degrees, and the same for thirds and sixths. 
That is, h = (0,14)(1,15)...(13, 27). Then, the PADSs D x = {0,1,3,7,15,24} 
and D 2 = {0,1, 7,10,15,17} have different orbits of length 336 under the action 
of the normalizer, and Dg = {0,1, 3,13, 20, 27} has a third orbit of length 56. 


4.6 PADSs with n = 6 

A classical conjecture in the theory of DSs is that no PDSs exist for any n that 
is not a power of a prime; and this conjecture has been proven through n < 
2,000,000 [5, Remark 18.68]. Accordingly, as n = 6 is not a power of a prime, 
we find no improper PADSs of that order. However, one equivalence class of 
proper PADSs exists for n = 6 : we find 512 non-abelian (32,7,1,20) PADSs in 
the central product Dg o Q 8 . 6 

Following, we construct a GIS that contains these PADSs. S is the set of 
32 trichords in the octatonic collection {0,1,3,4,6,7,9,10} that do not con¬ 
tain a tritone. The interval group G that has a simply transitive action on 
this set is the central product of a dihedral group of order 8 and a quaternion 
group, also of order 8 . The dihedral group can be generated by the pitch-class 
operations X 3 and I\ on the members of S. The quaternion group can be gen¬ 
erated by the following two 4-cycles: Q 1 alternates the pitch-class operations 
(0,6)(4,10) and (1,7)(3,9) on the members of S ; and Q 2 applies the pitch- 
class operation (0,1, 6 , 7)(3,10, 9,4) to members of the transposition classes 
(0,1,4) and (0,4, 7) in S , and the inverse operation (0, 7, 6 ,1) (3,4, 9,10) to the 
transposition classes that are the inverted forms of these trichords. Then, the 
set D = { {0,3, 7}, {0,4, 7}, {0,4,9}, {1, 6 ,10}, {3,4, 7}, {4, 7,9}, {7,9,10} } is a 
(32,7,1,20) PADS. 

5 Conclusions and Future Work 

In the preceding sections, we have examined how ADSs are a generalization of 
DSs, and how certain of the same properties that have attracted music scholars 
to PDSs are shared by PADSs. In particular, PADSs display maximum interval- 
lie efficiency. Further, we have investigated several GISs that include representa¬ 
tions of cyclic, abelian, and non-abelian PADSs. Further work remains regarding 
ADSs, particularly with regards to music-analytical applications (and especially 
those that involve ADSs with A > 2 ). It is our hope that the ideas presented 
here serve as a departure point for some of this future work. 


6 This group is also known as the extra special group of order 2^ 4+1 ^ of minus type. 
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Appendix 

The following GAP code may be used to verify the results of Sect. 4 regarding 

the existence of PADSs of small order. 

1. Determine all isomorphism classes of groups of order v . 

G := AllGroups(v); 

2. For each group G^, define an isomorphism to a permutation representation 
on the set S = {1,..., v}. 

Ii := IsomorphismPermGroup(G[i] ); 

3. Generate the permutation group. 

Pi := Image(Ii); 

4. Output all subsets D{ of size k in S. 

D := Combinations ([1 .. v] ,k) ; 

5. For each subset, output all pairs of elements. 

A := Combinations(D[j] ,2); 

6 . Determine the permutation in P that takes the first element in each pair to 
the second. 

Rj := RepresentativeAction(P,A[j] [m] ,A[j] [n] ) ; 

7. Generate the inverse element for each of the above permutations. 

Vj := Rr-1; 

8 . If the set of all Rs and Us for any Dj covers the non-identity elements of G^, 
and if some non-identity element of Gi is represented only once as an R or 
a V and the remaining non-identity elements are represented once or twice, 
then Dj is a PADS. 
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Abstract. The paper focuses on mathematical aspects of harmonies in 
extended just intonation and their relations. The first part lays down a 
theoretical framework for the investigation of structural features of such 
harmonies. Among other aspects, it addresses symmetry, inversion, and 
multiplication of harmonies. The second part explores transformational 
relations among harmonies of the same type, while the approach is intrin¬ 
sically dualistic. Riemann-Klumpenhouwer’s concepts of Schritts and 
Weeks els are generalized for ‘harmony spaces’ in extended just intona¬ 
tion. This enables a deeper analysis of harmonic ‘neighborhoods.’ Finally, 
a graphical representation of the complete neighborhood of a harmony, 
called ‘neighborhood network,’ is presented along with several simpler 
and more complex examples. 


Keywords: Extended just intonation • Transformation • Schritt 
Wechsel 


Two irreconcilable principles have shaped musical theories for a long time: 
purity of consonance and regularity of structure. The latter seems to have pre¬ 
vailed since the Baroque era. The concept of twelve tones equally spaced around 
the pitch circle, a regular structure par excellence , has enabled amazing achieve¬ 
ments. Yet, despite its inferior role in the post-Baroque period, the former princi¬ 
ple, the principle of (extended) just intonation, has played leading roles in other 
musical cultures. Ancient Greek, Renaissance, various kinds of ethnic music, or 
certain contemporary microtonal approaches - all have searched for just con¬ 
sonances. The crucial disadvantage of the consonance principle is a conceptual 
complexity of resulting structures, an issue to which the regular systems provide 
a cure, albeit at the expense of purity. This paper proposes an alternative concep¬ 
tual framework. It does not compromise the principle of harmoniousness and still 
it yields efficient means for manipulating complex pure harmonies. Its applica¬ 
tions are at least two-fold: it exposes interesting structural features of harmonies 
observed in existing music, and, more importantly, it enables explorations of new 
musical territories. The latter is our main motivation. The purpose of this paper 
is to summarize theoretical aspects as a preparation for developing specialized 
software for manipulating complex harmonies in extended just intonation. 
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1 Introduction 

The 12-tone equal temperament overwhelmingly dominates contemporary music 
and theory. For many it seems an ideal solution that have been achieved through 
a long evolution. And very often we forget that it is an imperfect compromise 
that distorts the most important aspect of music: its harmoniousness. While 
the impure 12-tone equal temperament prevails in western post-Baroque music, 
most of other musical cultures prefer tone systems based on the principle of 
just intonation. Various oriental or ancient Greek music theories emphasize har¬ 
moniousness of musical intervals based on simple integer ratios and develop 
manifold tone systems in just intonation [1,3]. The same principle of just conso¬ 
nances underpinned music theorizing also in the West for many centuries. Until 
quite recently, the idea that music should be restricted to 12 tones per octave 
was foreign to European music. Except for instruments with fixed tuning, even 
today G# and At> are two different tones in actual musical practice. However, the 
difference between enharmonic tones is not formalized in the standard tone sys¬ 
tem and is achieved through fine tuning by ear only. Since Renaissance, several 
theorists and musicians have tried to define explicit tone systems that would 
capture such distinctions. A 31-tone system is probably the most famous among 
them. It was proposed and used in actual practice by Vicentino [20] in the six¬ 
teenth century and independently described by seventeenth-century Dutch sci¬ 
entist Huyghens [8]. It was noted by several later theorists [14,21,22] and most 
notably by Fokker [5] who put it also into a modern practice. Henk Badings, 
among several other Dutch composers, applied the 31-tone system in his music. 

In contrast to contemporary understanding, the principle of just intonation 
was part of mainstream music theory still in the nineteenth century. This can be 
nicely illustrated by comparing neo-Riemannian [7] and ‘Riemannian’ Tonnetze. 
The two-dimensional lattice representation of pitches has enjoyed considerable 
popularity in recent music analysis and theory, especially in the neo-Riemannian 
circles. The term Tonnetz emphasizes its derivation from the nineteenth-century 
German music theory. However, there is a crucial conceptual difference between 
modern and original Riemannian Tonnetz. As Cohn explicitly discusses in his 
seminal introduction to the neo-Riemannian 1998 special issue of the Journal 
of Music Theory [2] , neo-Riemannian Tonnetz assumes enharmonic equivalence 
limiting the pitch-class space to 12 items while the nineteenth-century German 
theorists often used the lattice constructions to demonstrate a potentially infinite 
tonal space based on just intonation. For instance, Tanaka [18], Oettingen [13], 
and even Riemann [17] derived (different) 53-tone systems in just intonation 
using non-circular infinite Tonnetz. Related 53-tone systems were discussed also 
by several other theorists [5,19,21]. More recently, Ben Johnston explored a 53- 
tone system in 5-limit just intonation both in his writings [9] and his earlier 
microtonal music (e.g. String Quartet no. 2). 

The most significant drawback of the principle of extended just intonation 
is the vastness of resulting tonal spaces. Tones in (extended) just intonation 
can be represented as rational numbers. The 5-limit systems (generated only 
by the intervals of perfect fifth and just major third) can be represented on the 
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two-dimensional Tonnetz and the 7-limit systems (the interval of natural seventh 
is added) can be represented on a 3-dimensional lattice. If just intervals based 
on higher prime numbers are added we arrive to even higher dimensional spaces. 
The problem is to find a practicable way of navigating through such complex 
spaces. Johnston’s 53-tone and Fokker’s 31-tone systems illustrate one strategy 
of navigating through the 5-limit and 7-limit spaces: take a finite selection that 
covers most important part of the infinite space. The tonality diamond, invented 
by Meyer [12] and popularized by Partch [15], reflects the same strategy applied 
to the 11-limit space. This strategy enables building physical instruments for 
accessing the complex spaces of extended just intonation. However, its limitation 
is that the instruments are quite complex and still they make available only 
fragments of the infinite spaces. In his later microtonal approach, Ben Johnston 
has applied a different approach. His extended musical notation has enabled him 
to access vast parts of a space in extended just intonation. As he comments on 
his String Quartet no. 5: ‘The music is highly modulatory, thus involving a very 
large number of different pitches per octave, which since they are not being used 
as a scale, I did not bother to count’ [ 10 ]. 

This paper follows a similar strategy. It assumes extended just intonation 
(without any temperament) and does not impose a limitation through simple 
selection of tones. Instead, it proposes a mathematical model that provides nav¬ 
igation through a complex space of harmonies in extended just intonation. The 
main idea comes from the neo-Riemannian theory of transformations. Klumpen- 
houwer’s [ 11 ] concepts of Schritts and Wechsels are applied to harmonies in 
extended just intonation. 


2 Harmonies 


2.1 Basic Definitions: In the Footsteps of Euler 


Euler’s Tentamen novae theoriae musicae [4] offers a simple yet powerful mathe¬ 
matical framework for describing harmonies in just intonation. Even after nearly 
three centuries, it still may serve as a source of inspiration. Some of the notions 
defined below, such as index and exponent, follow Euler’s ideas and terminology. 

Harmony is a set of positive rational numbers. Its elements are called tones. 
Furthermore, consider positive integers xi,...,x n such that gcd(aq, ..., x n ) = 1. 
Then we say that the set X = {aq,..., x n } is a canonic harmony. 

Lemma 1. Consider a harmony H = {|^,..., where and bi are pairs 
of relatively prime positive integers for all i = 1,...,n. Then there exist 
unique rational number ind(if) and unique canonic harmony T(H) such that 
if = ind(if)T(if). 


We say that ind (if) and T (if) are the index and the type of the harmony if, 
respectively. The proof is straightforward if we put: 


ind (if) 

T(H) 


gcd(ai, •.. ,a n ) 
Icm (bi,...,b n ) ’ 

ind 


(index) 

(type) 
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In the next step, we introduce two more characteristics of a harmony. 
Counter-index cind (H) and exponent exp (H) are given by the following 
equations: 

cind (if) = 7 ^ n ) (counter-index) 

gcd(&i,..., b n ) 

exp (H) = cind(if)/ind(if). (exponent) 


The last characteristic, the exponent, is crucial because it is invariant for all 
harmonies of the same type. The following lemma confirms this statement. 

Lemma 2. Consider a harmony H of type T(if). Then its exponent is the least 
common multiplier of the elements of its type: exp (if) = lcm(T(if)). It means 
that all harmonies of same type share the same exponent. 

The following notation is assumed for transcriptions of rational numbers as 
musical tones in 7-limit just intonation. 1 corresponds to tone D. Unless neces¬ 
sary, octaves are not specified. If necessary, a numerical coefficient is attached: 
harmony {1,2} is the interval DqDi. Powers of | are transcribed through the 
chain of fifths with the usual system of sharps and flats: | is G, | is C, || 
is F(( and so on. Corrections by syntonic comma are shown with an apos¬ 
trophe, which is placed in lowered or raised position to show correction by 
the comma downwards or upwards, respectively. So DFfl, is a just major third 
{1,|} = {1, fifil- The corrections by septimal comma || are denoted by the 
slash or backslash symbols for upward and downward corrections, respectively. 
Thus, DC\denotes a harmonic seventh interval. 

To illustrate the basic concepts defined above, consider the C major 
scale in a usual just intonation: CDE,FGA,B,. It corresponds to the har¬ 
mony H = {§, 1, ip, |f, f, §}• Its index is ind(if) = counterindex 
cind (H) = 160, exponent exp(if) = 4320 = 2 5 3 3 5 1 , and its type T (H) = 
{24,27,30,32,36,40,45}. 


2.2 Multiplication of Harmonies 

Now we will define a binary operation on the set of harmonies. Let A and B be 
two harmonies. Then their multiplication is defined by the following formula: 


AB = {ab | a G A, b G B}. (multiplication of harmonies) 

This operation is associative and commutative. It also has a neutral element, 
which is the trivial harmony {1}. However, obviously, there are no inverse ele¬ 
ments except for trivial harmony {1}. So the algebra of harmonies based on the 
operation of multiplication is a commutative monoid. 

We say that a harmony H is composite if there are non-trivial harmonies A 
and B such that H = AB. A harmony is called prime if it is not composite. 
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Consider the following equation for pairwise distinct prime harmonies P \,..., P n 
and positive integers fci,..., k n . 

H = P f 1 • • • P% n . (prime factorization) 

The right side is called a prime factorization of harmony H. Any harmony has 
at least one prime factorization. However, it is not necessarily unique. 

Harmonies whose prime factorization is a power of a single interval (2-element 
harmony) exhibit special structural features. We say that a harmony H is a chain 
generated by the interval {1 , r} if H = g{l,r} n for positive rational numbers q 
and r and a positive integer n. The usual Pythagorean scales are chains generated 
by the perfect fifth {1,3}. 

Prime factorization provides crucial insights into the internal structure of a 
harmony. Let us compare two scales whose intervallic structures are significantly 
different: the diatonic scale and Hungarian minor scale. (We disregard octave 
differences in this example. It means that factors of 2 are removed from rational 
numbers and such rational numbers might be interpreted as ‘harmony classes.’) 

Prime factorization of the diatonic scale of D major is {1,3, 5, 9,15, 27,45} = 
{1, 3} 2 {1, 3, 5}. It explicitly demonstrates the structure of the scale: comprised 
of major triads transposed to positions given by a chain of three perfect fifths. 
There is no doubt that major triads and the three perfect fifths (corresponding 
to subdominant—tonic—dominant) are the structural basis of the scale. 

This harmony also illustrates the non-uniqueness of prime factorization. Its 
other factorization is {1, 3}{1, 5, 9,15}. The second factor is Ptolemy’s tense dia¬ 
tonic tetrachord [1]. This factorization represents a different structural decom¬ 
position of the diatonic scale, typical for classical ancient Greek music theory: it 
comprises two tetrachords at the distance of perfect fifth. 

Now consider the Hungarian minor scale { ^, 1,1, 3, 5,15}. Its prime factor¬ 
ization is {1,3}{ 1, 5}{ 1, y^}. It means that it is a multiplication of the perfect 
fifth, just major third, and diatonic semitone. Again, this nicely highlights the 
structural significance of the three intervals in the Hungarian minor scale. 


2.3 Inverses and Symmetries 

In this subsection we investigate harmonies constructed from inverted tones. 
We define two related transformations: the context-independent dual and the 
exponent Wechsel , which is contextual. 

Consider a harmony H = {/q,...,h n }. We say that the harmony H' = 
{hf 1 ,..., h~ x } is the dual of the harmony H. Now consider another transfor¬ 
mation that maps harmony H to the harmony W e (i7) = ind(i7)exp(i7)T(i7) / = 
cind(i7)T(iL)'. We call it exponent Wechsel (The naming convention will become 
clearer in the context of a theory of Schritts and Wechsels discussed later.) 

As an example consider the C major triad C + = {§, iy, f} and D major 
triad D + = {1, |, |}. Then their dual harmonies are A minor triad {|, |} 

and G minor triad {1, |, |}. On the other hand, exponent Wechsel results in E} 
minor triad {iy > §, ^y } and FJj} minor triad {§, 3, }y}. It means that the dual is a 
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context independent inversion with the tone D in the centre of symmetry. On the 
other hand, the exponent Wechsel is a contextual transformation corresponding 
to the neo-Riemannian Leittonwechsel (for the specific case of triads). 

Further we say that a harmony S is symmetric if S = \N e (S). For a symmet¬ 
ric harmony S = {s i,..., s n } we define the point of symmetry sym(S') as the 
geometric mean of the contained tones: 

sym(S') = tys i ... s n . (point of symmetry) 

One can observe that sym 2 (5') = ind(5)cind(5). 

Trivially, all one-note and two-note harmonies are symmetric. Therefore, H 
is least symmetric if it contains no symmetric subsets but one- and two-tone 
subharmonies. In that case we say that harmony H is primitive. The concept 
of primitiveness will play an important role in the second part of this paper 
where we investigate transformations of harmonies. There is a relation between 
primitive and prime harmonies. 

Theorem 1. A harmony is primitive iff all its subharmonies are prime. 

The theorem has a simple corollary. If a harmony is primitive then it is prime. 
The converse statement is not valid. For instance, the dominant seventh chord 
{1,3,5, |} is prime but not primitive because of a symmetric subset {5,3, |} 
with the point of symmetry equal 3. 

2.4 Transpositional Intersection 

In this section we define transpositional intersection. It is an important tool 
when we investigate musical commonalities between two harmonies. 

Consider two harmonies A and B. We define the set of transpositional inter¬ 
sections of harmony A over harmony B , denoted ^5, by this formula: 

^B = {X C B | 3q £ Q : qA D B = X}. (transpositional intersections) 

Obviously, two different transpositions of a harmony A may have the same inter¬ 
sections with a harmony £?, i.e. for q\ q< 2 we may have qiAdB = H 5. 
Therefore, we define the set of transpositional coefficients for a harmony included 
in the set of transpositional intersections of A over B: 

transp(X £ ^B) = {^ G Q | qA(~] B = X}. (transpositional coefficients) 

We say that A and B primitively intersect if |transp(X)| = 1 for all X £p B (it 
is easy to see that this relation is symmetric). 

Finally, we consider intersections of a harmony and its dual. We define the 
symmetry set S(A) of a harmony A: 

S(A) = p A. (symmetry set) 

The following theorem summarizes properties of the symmetry set. This set 
plays an important role in the description of canonic Wechsels investigated in 
the second part of the paper. 
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Theorem 2. Let A be a harmony. Then the following statements hold. 

(i) A and A' intersect primitively. 

(ii) All elements of S(A) are symmetric. 

(Hi) 2\A\ — 1 < |<S(A)| < ||A|(|A| + 1). The lower bound is achieved iff A is a 
chain, the upper bound iff A is primitive. 


3 Transformations of Harmonies 


3.1 Schritts and Wechsels 


Let X be a canonic harmony and consider the set of all harmonies of type X 
and X'. We call it harmony space of type X and denote it 7Y(X). Obviously, this 
concept is dualistic and 7Y(X) = TL(X'). 

We define two kinds of transformations on a harmony space. Let H be a 
harmony from the harmony space Ti(X). Hence there exists a positive rational 
number h such that H = hX or H = hX' . Furthermore, let q be any positive 
rational number. Then Schritt S q and Wechsel \N q are transformations defined 
on 7 ~t(X) by the following formulas: 

S q (hX) = hqX, S q {hX') = hq^X', ( Schritt ) 

\N q (hX) = hqX', \N q (hXf) = hq~ l X. ( Wechsel ) 

As one can observe, the definitions take inspiration from the homonymous 
concepts as defined by Klumpenhouwer [11] in his seminal work reformulating 
Hugo Riemann’s theories of harmony. Schritts preserve harmony types and trans¬ 
pose dual harmonies in opposite directions. Wechsels change harmony types and 
reflect dual harmonies in opposite directions. 

The system of Schritts and Wechsels is an infinite group. Let us denote it 
general SW -group. The trivial Schritt Si is its neutral element. 


Lemma 3. Let q and r be positive rational numbers. Then: 



S q S r = S qr , \N q S r = \N q -i r , 

S,w r = W gr , W,W r = s g -i r . 


3.2 Harmonic Neighborhood 

Consider a canonic harmony X and a set of tones T. Our aim is to explore all 
harmonies from the harmony space 7Y(X) that share at least one tone with T. 
We call the set of all such harmonies the neighborhood of set T and denote it 
A/x(T). Below we consider two important cases: neighborhood of a single tone 
and neighborhood of a harmony belonging to H(X). 

Assume that the canonic set X = {xi,..., x n } is of cardinality n. One easily 
observes that the neighborhood of a single tone t contains exactly n distinct 


www.ebook3000.com 


Algebra of Harmony 


83 


Schritts and up to n distinct Wechsels : A fx(t) = {S tx -i(X.),\N tXj (X.) \ j = 

1,..., n}. The total number of harmonies in Afx.(s) may be lesser than 2 n. This 
happens iff the canonic set X is not symmetric. 

In the next step, we explore the neighborhood of a given harmony A of 
type X. Let B G 7Y(X) be any harmony from the harmony space of type X = 
{xi,... ,x n } such that B belongs to the neighborhood of A , i.e. B G Afx(A). 
Select an element h from the intersection of A and B, i.e. h G AnB. Considering 
the types of A and B , there are four possible cases: 

(i) T (A) = X and T (B) = X. Then we have h = ind (A)xi = ind (B)xj for some 
i, j G {1,... , n}. In this case B = S x . x -i(A). 

(ii) T(A) = X' and T(L>) = X'. Then we have h = c\nd(A)x^ 1 = cind (B)xJ 1 
for some i,j G {1,... , n}. This also leads to B = S -i(A). 

(iii) T(A) = X and T (B) = X'. Then h = ind (A)xi = cind (B)xJ 1 for some 

ij G {1,... ,ra}, in which case B = \N XjXi (A). 

(iv) T (A) = X' and T (B) = X. Then h = cind(A)x i _1 = ind (B)xj for some 

i, j G {1,... ,n}. This final case results again in B = \N XjX .(A). 

The preceding reflections lead us to the following theorem. 

Theorem 3. Let X be a canonic harmony. Assume a harmony A from the 
harmony space 7Y(X). Then we have: 

Ak{A) = {S^-i (A),W xy (A) | x,ye X}. 

For a given canonic set X, we call the Schritts and Wechsels from the previous 
theorem canonic and denote Sx and Wx the sets of all canonic Schritts and 
all canonic Wechsels , respectively. Thus, the previous theorem states that the 
neighborhood of a harmony A can be achieved by applying transformations from 
Sx U Wx to it. 

For a Wechsel \N xy one can distinguish two cases: either x = y or x ^ y. In 
the former case we call it a tone Wechsel , in the latter an interval Wechsel. In 
general, a tone Wechsel can equal an interval Wechsel. However, they are always 
distinct if the underlying canonic harmony is primitive. 

Corollary 1. Let X be a canonic harmony of cardinality n and A G 7Y(X). 
Then the neighborhood of A contains up to n(n — 1) non-trivial Schritts, up to 
n(n — l)/2 interval Wechsels, and exactly n tone Wechsels. 

Lemma 4. The maximal counts for Schritts and interval Wechsels in the har¬ 
monic neighborhood Af(X) from Corollary 1 are achieved iff X is primitive. 
Moreover, in such case all canonic Schritts and Wechsels are distinct. 

There is a direct relation between the symmetry set of a canonic harmony 
X and the canonic Wechsels of type X. Let S G <S(X) be an element of the 
symmetry set of X. We define the symmetry Wechsel of S acting on H(X): 


W 5 = W sym 2(£). 


(symmetry Wechsel) 
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It is easy to see that the symmetry Wechsel is a canonic Wechsel of type X. The 
following theorem puts the symmetry sets and canonic Wechsels into an even 
stronger relation. 

Theorem 4. Let X be a canonic harmony. Then we have: 

Wx = {w 5 I 5 G S(X)}. 

Moreover, if Si ^ S 2 for Si, S 2 G S(X) then W s x 7 ^ W s 2 - 

3.3 Transformational Networks of Neighborhoods 

We assume a primitive canonic harmony X = {x \,..., x n } of cardinality n. Our 
aim is to find a graphical representation of the neighborhood of a harmony of type 
X with an ultimate goal of defining a user interface for manipulating harmonies 
included in the neighborhood. If n = 3 the solution is well known: The general¬ 
ized Tonnetz generated by X represents of the entire subgroup generated by the 
canonic Schritts and Wechsels. There the neighborhoods can easily be located. 
The Tonnetz can be elegantly generalized to higher cardinalities through simpli- 
cial lattices of higher dimensions. Gollin [6] explores such a structure for the case 
of n = 4 where harmonies are represented via 3-simplexes (tetrahedra). How¬ 
ever, this approach is not applicable for our purposes: higher cardinalities lead 
us quickly to high-dimensional spaces that are difficult to manipulate through a 
simple user interface. Therefore, we follow a different route here. 

First, let us focus on the simple case of the neighborhood of a single tone Xi 
for any i = 1,..., n. As discussed above, A/x(^i) is given by {S x . a .-i , \N XiXj | j = 

1,... ,n}. Any two harmonies from A fx(xi) share exactly one tone if they are 
of the same type (two Schritts or two Wechsels ) and exactly (we assume prim¬ 
itiveness of X) two tones if they are of opposite types (one Schritt and one 
Wechsel). We will represent them on a circle with 2 n positions (or a regular 
2 n-gon). Distribute the Wechsels to even positions so that \N XiXl , ..., \N Xi x n are 
ordered clockwise. And distribute Schritts S x . x -i,..., to odd positions, 

also ordered clockwise. As there are n options for the mutual position of the 
tone Wechsel \N XiXi and the identity Schritt S -1 = Si, there are n different 
constellations for this cyclic construction. Any two neighbors on such a cycle are 
related through a Wechsel and, thus, share two common tones. 

Now we proceed to the construction of a transformational network for a 
neighborhood of a harmony A G 7Y(X). First, assume that the cardinality n of 
the underlying canonic harmony X = { x \,..., x n } is odd. Let A = {ai,..., a n } 
where cq = axi for all i = 1,..., n or a* = axf 1 for all i = 1,..., n . Draw n 
equal circles (or regular 2 n-gons) 71 ,..., y n with centers distributed regularly 
on an auxiliary circle of the same diameter. Thus, the n circles have a com¬ 
mon intersection in the center of the auxiliary circle. Denote this intersection 
T. Moreover, each pair of circles has exactly one more intersection. (The last 
condition is not met in the case of even n and, therefore, we will address that 
case separately.). The n circles will represent cyclic transformational networks 
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of neighborhoods of the n tones ai, ..., a n included in A, ordered clockwise. We 
distribute the Wechsels and Schritts of A on the circles in the following way. 
Interval Wechsel \N XiX . , i ^ j, is located in the intersection of 7 i D jj \ {T} and 
the tone Wechsels \N XiXi on the circles 7 \ opposite of the common intersection T. 
By now, all Wechsels are evenly distributed along the circles, occupying exactly 
n positions on each. The Schritts will be located on the n positions regularly 
alternating with the Wechsels. The identity Schritt Si = S x . x - 1 is put to the 
common intersection r. All remaining Schritts are assigned to the correspond¬ 
ing positions on corresponding circles, ordered clockwise. We will call this graph 
neighborhood network. 



Fig. 1 . Neighborhood network for a three-note harmony. Left: Neighborhood network 
for harmony A = {a, 5, c}. Right: Comparison with the standard neo-Riemannian case 
harmony space of major and minor triads. 


Figure 1 illustrates the simple case when n = 3. The left side of the figure 
shows the neighborhood network for a three-element harmony A = {a, 5, c}. As 
mentioned above, this simple case is easily modelled by the Tonnetz and the 
Schritts and Wechsels correspond to well-known neo-Riemannian transforma¬ 
tions. The right side of the figure provides the comparison. 



© = Schritt S Z (H) 
© = Wechsel \N Z (H) 


Fig. 2. Neighborhood network for a five-note harmony {a, 5, c, d, e}. 
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Figure 2 shows the neighborhood network for harmony spaces based on five- 
note canonic harmonies. There is a relation between the five-note case and the 
famous Penrose tiling. Schritts and Wechsels from the neighborhood can be 
assigned to pentagons of the central part of Penrose [16] original irregular tiling. 



Fig. 3. Neighborhood network for a four-note harmony. Left: A generic case for a 
harmony {a, b , c, d}. Right: Neighborhood network of a natural seventh chord GB,DF\. 


To draw neighborhood networks for harmonies with an even number of tones 
we make a simple trick. For n even we consider the neighborhood network for car¬ 
dinality n + 1. Then we remove one of the circles and the corresponding Schritts 
and Wechsels. The left side of Fig. 3 shows the neighborhood network obtained 
via such a procedure for the case n = 4. The right side of Fig. 3 illustrates the 
neighborhood network for the natural seventh chord GB,DF\ from the harmony 
space H(l, 3, 5, 7). 



Fig. 4. Neighborhood network for a seven-note harmony. 
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Finally, Fig. 4 shows neighborhood networks for seven-element harmonies. 
Again, the neighborhood network for a six-element harmony could be derived 
from that network by omitting one of the circles and the corresponding Schritts 
and Wechsels. 
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Abstract. Argentine Tango faces dancers with specific challenges. Since 
the dance is improvised, the leader is expected to follow patterns and 
trends in the music on the fly. While musicians have an advantage, many 
beginners prove unequal to the task, and are often driven to abandon. 
Compass Trainer is a piece of smartphone software intended to help 
dancers feel and integrate in their movements the ‘Compas’, the rhyth¬ 
mic pulse of the dance. Its development blended theoretical and down- 
to-earth, practical considerations. Our team had to take into account the 
mixed rhythmical structure - binary with a ternary component; explore 
signal processing techniques such as beat tracking; interview tango mae- 
stros and musicians, and build a mobile application to help our users dis¬ 
cover the rhythmical layers of a tango track. Experimentation in Tango 
classes was rewardingly successful. 


Keywords: Tango • Dancing • Rhythm • Beat-track 
Pedagogical software • Pedagogy studies 


1 Introduction 

In many social dances, the strong binary pulsation helps the dancers move their 
feet onto the beat. However, the same is not true in Argentinian Tango. For 
several reasons that we tentatively explore in Sect. 2, many tango dancers are 
uncomfortable with the ongoing pulsation, which is as bad a translation as any 
for the Spanish, almost mystical term, of ‘Compas’. The complexity of tango 
music, blending melodic lines with sometimes irregular rhythms, is a challenge 
for the leader. Quoting [3]: 

In tango music, the melody is often given a rhythmic treatment. It is often 
impossible to separate them entirely, but learning to discriminate between 
the melody and the rhythm is one of the most important skills of all. 

© Springer International Publishing AG 2017 
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This is especially cruel for leaders, because their follower sometimes has a better 
perception of rhythm, with tragic effect on the harmony of the dancing couple. 

One of the authors (see Sect. 3) was sensitive enough to the issue to engage 
into the development of a software purported to help tango dancers with their 
perception of the Compas , and provide a tool for learning to dance on the music, 
be it alone, in couple, at home, or during a tango class. This involved gathering 
a team, with computer programmer, music theorist, tango teachers, media com 
expert, etc., and meeting with tango musicians, maestro dancers and high-tech 
hardware developers. The present paper reports some of the difficulties we met, 
and the choices we made to solve them. The resulting application (available 
online) could also be used by dancer with any level, teacher, researcher on ped¬ 
agogy and on the relationship between musical and dancing gestures (suggested 
by reviewer). The present paper begins with some general explanation of tango- 
specific difficulties, followed by a dancer’s testimony and the goals, elaboration 
and testing of the software. 

2 Difficulties in Dancing Tango 

There are some rare, documented cases, of people having difficulties following 
a rhythm and coordinating their movements with it [4]. Experimental studies 
would rather tend to show that the ability to move in rhythm is well anchored 
in human behavior and is a characteristic feature of human beings. 1 Indeed 
most people can dance to a beat in a disco and learn easily to move in rhythm in 
most dances: rock, salsa, bachata, swing, rumba, waltz... Why then do so many 
leaders find it so difficult to guide their steps in accordance with the pace of tango 
music? Listening to https://www.youtube.com/watch?v=N-FyPMYjFpc 2 may 
hint at some of the problems involved (Fig. 1). 


2.1 Practice of Argentinian Tango Dancing 

Argentinian Tango is an improvised dance with a leader and a follower. Adap¬ 
tation and flexibility are required: most often, the leader invites a follower for 
a series of three or four pieces (a tanda ), and switches to another for the next 
tanda. The follower mostly fulfills (as best she can) the leader’s suggestions. 3 
Hence the leader is responsible for his own movements, for guiding precisely and 
unambiguously his partner’s, and for the circulation of the couple among the 

1 Or sea lions. See [2]. 

2 Astor Piazzolla’s Contratiempo played by Pichuco (Anibal Troilo)’s orchestra, where 
the former began as a bandeonist. Many DJs shy away from Piazzolla in fear that the 
difficulty of his music may discourage dancers; conversely, many dancers begrudge 
this ban and ask for more challenging and delightful music! A frequent compromise 
is to have a couple of more difficult tangos (Piazzolla, Pugliese) but not before 2 
a.m. 

3 We discuss common practice. Of course, experts can develop more freedom and 
originality. 
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Fig. 1 . Screen copy of Compass Trainer app 


other dancers (walking anticlockwise round a circle in the ball hall and develop¬ 
ing more complicated moves around the rueda when traffic allows). Best dancing 
follows the music closely - what this means precisely and how it is achieved are 
central topics in this paper. This involves stepping on the beats, or only on 
the strong beats, or sometimes on half-beats (contratiempo); and taking into 
account pauses/rests. Choosing the placement of steps presupposes perception 
of the different instrumental lines (which may be staccato or legato, sometimes 
superimposed), and mostly discerning the Compas , the pulsation (which may 
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be hidden by violent syncopation or other rhythmic accidents). The following 
overview is perforce cursory, and counter-examples could easily be found to all 
of its assertions; still it purports to picture with sufficient accuracy most of the 
music that a milonguero will encounter on a typical Milonga (Tango ball). 


2.2 Three Genres 

This bewildering avalanche of difficulties is concentrated in Tango proper: there 
are essentially two other genres in Tango balls, Milonga and Vais Criollo (a.k.a. 
‘Peruano’) whose beats are much more regular. The Milonga is binary (in 2/4) 
and fast (between 92 and 120 beats/m), Vais is of course ternary, usually with 
a clear strong first beat. We did test our work on some vals and milongas for 
comparison purposes but mostly focused on Tango. 


2.3 Complexity of Beat 

It is quite clear that Tango rhythms are much more complicated than most dance 
musics, especially the North-American ones originating in two-step, Swing and 
its cousins (Rock and its prolific family); not to mention the rhythmically poor 
Dance, Techno, electro-techno and their ilk. One explanation is the frequent 
occurrence of typical rhythmic figures such as the Habanera rhythm (or Tango 
congo’) 4 , T 3 « T 3 . The sixteenth note hints at a ternary component in this pattern, 
which was eventually pinpointed (skipping the eighth note on beat 2) in many 
tangos from the fifties onward 5 , and is known as tresillo , cf. Fig. 2. The latter is 
actually a maximally regular division of the period 8 in three parts, generated 
by the interval 3 in accordance with the general theory of Maximally Even 
Sets. 6 The presence of a ternary component is apparent on the Fourier spectrum 
of the Habanera rhythm, which is fairly evenly distributed between different 
periodicities; and even more so on the tresillo. 

In addition to the habanera cell, already famous through the eponymic 
Habanera in G. Bizet’s Carmen , another typical syncopated rhythmic motif in 

early tango is which is actually used as a marker of the genre by 

classical composers: see Debussy’s Une soiree dans Grenade , Albeniz’s Tango 
Espagnol , Satie’s Tango perpetuel... on Fig. 3. 7 

It is illuminating to compare (Fig. 2) with a two-beat rock music, with its 
definitely binary character, and at the other end of the spectrum, with the clave, 


4 Spanish in origin, the Habanera is but one of the numerous ancestors of Tango music. 

5 A most famous example of the superposition of binary and ternary is Piazzolla’s 
Libertango. 

6 Because 3x3 = 1 mod 8. With an eye on the following discussion, a reference is 
[1]. Note that the complement set, the cinquillo ‘filling the background of’ tresillo, 
was actually used as a ternary pattern by Astor Piazzolla - just as a complement 
form of the Clave rhythm sometimes occurring in Cuban music. 

7 We find triolets in tangos composed by classical musicians, more than in popular 
Tango, cf. Ravel’s Piece en forme de Habanera for instance. 
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Fig. 3. A typical early-tango rhythmic cell 


another complex rhythm occurring in salsa and other latin american dances. The 
latter is even more complex than the Habanera; however, a Salsa dancer will, 
just like a Rock dancer, step on the binary pulsation only: once the first beat of 
the ever-repeating Salsa rhythm is identified, both dancers can carry on iterating 
their (rhythmically speaking) simplistic four-step sequence. 

Historically speaking, the density of off-beat accents (syncopes) increased 
with time, from none at all in the primitive Canyengue style (end of XIX th 
century), to twice a beat in late Pugliese or Piazzolla music and Tango Nuevo 
(1950-1980); recently the pendulum swung back to few syncopes in Electro- 
Tango which features more repetitive and binary rhythms. 

Another typical trait is the Arrastre , a precipitous acceleration of a sequence 
of notes. It can be understood as a specific form of rubato, usually played just 
before a strong beat. It evolved in the extreme form of the Yum-ba with Osvaldo 
Pugliese’s eponymic tango (1946), where the piano slaps a rumble and tumble 
of indistinguishable notes just before the beat. 8 

Irregularities occur also on a larger time-scale, with variations in the repeti¬ 
tions which usually enrich the rhythm, more or less intense rubato, rests, changes 
of tempo. 

Undoubtedly this wealth of musical variety enhances the interest and pleasure 
of the listener; for the dancer however, it induces difficulty on two accounts: 

1. The perception of the Compas may be difficult - this is precisely what the 
software Compass Trainer purports to remedy; 


8 According to Pugliese, the actual notes had little importance because of the poor 
quality of the instruments on which he played at the time! 
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2. The rhythmic complexities can and should be taken into account in the lead¬ 
ing. For instance, strong offbeat accents can hardly be ignored in the dance, 
ends of phrases and rests should induce suspensions in the movement, and 
so on. 


2.4 Instrumentation 

The music ensembles used in Tango changed ceaselessly during a century or 
so. Astor Piazzolla even introduced percussion (including a vibraphone!) and 
an electric guitar in the 70’s, and modern Electrotango (Otros Aires, Gotan 
Project ) uses beatboxes. But most balls cling to more traditional, even tradi¬ 
tionalist, ensembles, typically using bandoneons, violins, piano, a double bass 
and sometimes lyrics (tango cancion). 

Most instruments can and will play either staccato or legato, the strings pizzi¬ 
cato or arco; sometimes alternatively (so typical of Carlos Di Sarli), sometimes 
in counterpoint, sometimes both in the same phrase. An expert dancer can even 
dance following one instrument and guide his partner to dance on another - but 
such dancers do not need our software. 

3 The Problem 

We have analyzed some of the difficulties - specific to Tango dancing - that 
may arise, from an academic viewpoint. From this perspective, there are no 
compelling reasons to invest time, money and dedication into the research and 
production of a piece of software remedying these difficulties. In this section, 
we change tack and recall the vividly painful experience of one of the authors, 
who was thus moved to launch the Compass Trainer adventure. In step with the 
unexpected changes of rhythm or style in Tango music, we switch from academic 
style to first-person writing. 

A Woeful Though Common Story 

My first 18 months as a tango dancer were Hell... I did suffer so much 
that many times I was about to give up - it took two years for tango to 
turn into a pleasurable activity with plenty of good music in the arms of 
multiple ladies. However, it was not easy to diagnose the reasons. 

When discussing with experienced tango classmates, I recognized that each 
had a different way to live as a happy dancer, even if most of them had 
experienced painful starts. 

Even more interesting were the interviews of those dost in dancelation’. 9 
I did try to get in touch with all the people giving up tango after a few 
months or a couple of years. Basically, we are talking about half of the 


9 A word forged for expressing the transformation of an ordinary individual into a 
dancer. 
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people who did start a tango course. The waste is just enormous, especially 
on the masculine side of the population. 

Most of these men shared a similar story: tango is ‘too much’ for unedu¬ 
cated ears and feet. Among the men starting tango in the second half of 
their life, many come without regular practice of sport or music. Learn¬ 
ing tango appears as an insuperable mountain to climb, generating more 
frustration than pleasure. 

Why? 

Leading in tango, which is overwhelmingly the men’s part, combines four 
major responsibilities: the motion, the partner, the space and the music. 
We will not sort them by importance, but the four are crucial to becoming 
a happy tango dancer. 

Motion is steps (forward, backward, both sides, pivots, revolving, etc.) 
combined with torsion between hips and chest ( disociacion ). 

Partner means that the guider must know continuously where the partner 
is, his/her orientation (hip and chest), the weight partition between his/her 
feet, the momentum of its motion, his/her stability. 

Space is the few square feet on which the couple can move (or stand). 
This space can vary from several feet to a few inches, depending on how 
crowded the dance floor is. 

Music is the raw material that both dancers will transform into motion. 
Initially, the leader creates a motion to be followed by his partner. Happy 
dancers go way beyond this partition: the leader initiates a motion to be 
harmoniously exchanged, enriched, and playfully challenged by his partner. 
Tango then becomes a peer-to-peer, witty, whimsical and artsy dialog. 
Going deeper into the analysis, two fundamental skills are found essential 
for the ‘dancelation’: good feet and good ears. 

Good feet are the basis for the muscular tonus, required for precise motion. 

Good ears are the natural way to integrate music. Tango music sustains 
different layers (rhythm, melody, song, etc.) which are as many proposi¬ 
tions for the dancer. The dancer is just another instrument joining the 
orchestra and playing with it. With experience, one can see by watching 
a dancer which instrument he/she follows and mimics. The beginner in 
Tango will usually try to follow the rhythm given by the double bass: a 
step on every strong beat, a pause when the instrument does not play, etc. 
Later on, other musical layers will be taken into account by the dancer, 
but the starting point is always the same: follow the bass! Easier said than 
done: the double-bass plays soft notes, hard to hear. 10 Most tango begin¬ 
ners, if asked to listen to the bass and follow it, will find the exercise a 
painful and frustrating challenge. 


10 As observed by [3] and other DJs and professionals, much of bass dynamics is lost 
through (old) recordings and cheap audio systems - not to mention the gut strings 
used of old for the bass. This is also true of most harmonics (above 5-7kHz), but 
this is less of an issue for the perception of rhythm. 
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Taking all this in consideration, I decided it was high time someone pro¬ 
vided these unhappy creatures with simple means to improve their feet 
and ears, in order to reduce their frustration and enhance their experi¬ 
ence and learning of tango. Thinking on it, I came up with specifications 
for a smartphone application that would highlight the double bass, as a 
reference time-line to help dancers practice and progress. 

A team was then gathered and financed, comprising programmers, dancers, 
tango teachers, music theorist, covering the required wild field of expertise. We 
uncovered a wealth of ideas and tracks for development. We had to choose which 
were to be expanded and which had to wait for the software to reach economic 
sustainability. What we endeavored to do, and what we eventually managed, is 
the topic of next section. 

4 Compass Trainer: What, Why, How 

4.1 Goal and Means 

As outlined in the preceding section, the project started with a large spectrum 
of possible objectives: 

- Extraction (in realtime) of relevant rhythmic information. 

- Extraction (for post-processing) of musical information: recognition of rhyth¬ 
mic cells, segmentation in phrases and sub-phrases, by timbre, by character, 
by frequency (treble/medium/bass), by loudness (melody, accompaniment). 

- Possibility of selection (filtering) by instrument, tempo, etc. 

- Providing several superposable informations: beat-tracks by instruments, 
strong beats, weaker beats, climaxes, ends of phrases, staccatos, legatos. 

- Analysis of finer aspects of the step of an expert tango dancer (changes 
of repartition of pressure on ball, toes, sides) for calibration. Analysis and 
recording of the steps of the dancer(s) in real time and correction against the 
recorded values. 

- Establishing a playlist of copyright-free tango musics that could be analysed 
and used in the app. 

The team took several months examining what was actually feasible, or desir¬ 
able, leading to several reformulations of objectives. Many algorithms for beat¬ 
tracking were tested, diverse renderings of the rhythmic information (which had 
to be suitable for a phone’s screen) were tried, most discarded. At this stage, 
with wider, deeper, and less shaky foundations for the project, we decided to 
enlist some more advice. 


4.2 Consulting Experts 

At several stages we fruitfully asked for expert advice maestros, musicians, teach¬ 
ers, and the eventual end-users, ordinary milongueros. Tango musicians (from 
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the band Silbando ) for instance confirmed the need for enhanced beats for the 
dancers, explaining how they played rhythm differently in a ball or a concert: 
less rubato, stronger accents especially in piano and bass. To quote the leader 
Chloe Pfeiffer, “in a ball we have to hear what we are playing above the din”! 

We chose to abandon for the time being the question of feet tonicity and pre¬ 
cision (along with precise measurements of position and movement of dancers, 
which are not yet supported by smartphone technology), but some experi¬ 
ments were made and data saved for future research, with captors on the sole 
and recordings of maestros dancing with coordinated video, abstraction of the 
mechanics of dancers (software-reduced to articulated skeletons), the beat track 
and sole pressure variations on 17 points per foot. 

Though this data does not directly reflect on the current state of the app, 
it is all but anecdotic: for instance, video recording compared with beat tracks 
showed that those maestros who were most vocal about the imperious necessity 
of following the double bass ’... did not! 


4.3 Abstracting the Relevant Rhythmic Information 

As sketched above, we had occasion to explore several techniques for MIR (Music 
Information Retrieval), constituting an ecosystem of open-source apps dedicated 
to the annotation of audio files with visual and/or audio markers produced in 
sync with the music. 

The first layer is obtaining data about the beats or onsets of the piece of 
music. We needed absolute temporal measurements (down to milliseconds pre¬ 
cision) for positioning the piece’s metric against it, specifying any musical ele¬ 
ment’s position in a bar and relatively to the different beats/sub-beats in the 
bar. 

Among other open-source solutions, we chose Sonic Visualiser (see Fig. 4 
below). Developed by Queen Mary’s University in London, Sonic Visualiser is 
dedicated to the facilitation of analysis and visualisation of musical tracks. Units 
for signal processing are available as plugins in VAMP format, allowing analysis, 
edition and exportation of processed data. 

In the simplest cases (when the rhythmic character is more robust and 
clearcut, for instance in Gotan Project’s Epoca ), the plugins dedicated to bar 
and beat tracking in the standard “Queen Mary Plugin Set” provided quite 
honorable results. 

However, when the music exhibits large tempo variations (rubati or slowing 
down to fermata), or features a singer, most beat detection algorithms are off 
their rocker. In the first case, we found that some recent algorithms based on 
pre-trained neural networks [10,11] do a fair job of following the players’ rubati , 
essentially extrapolating from the memory of the preceding beats - a fair job, but 
not perfect. Nonetheless, many traditional tangos, with old but decent recordings 
(1920-1950), could be processed entirely automatically (for instance Di Sarli’s 
La Morocha). 
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Optimizing the algorithms’ efficiency did require some external knowledge 
and user-parametrization. 11 In summary, the algorithmic detection provides a 
substantial reduction of the analysis time (bearing in mind that we aimed at 
building, in time, a library of hundreds of annotated music pieces) but some 
(qualified) human intervention, annotation and correction is still required, espe¬ 
cially when musicians distort the metric a lot. 

The second layer was implemented as Ruby scripts [14]. It remedied some 
defects in the outputs of the first layer, and formatted all the data extracted 
from the sources and their processing. For instance, all audio was normalized to 
.wav format. Besides, a data layer describing some formal musical information 
(e.g., bars and beats) was added, abstracting this information independently of 
the fluctuations of tempo. Here is a sample of the format retained for encoding 
a single beat: 

"abs_beat": 3.0, 

"bar": 1.0, 

"beat": 3.0, 

"bpb": 4.0, 

"timestamp": 0.35736961371046183 

This represents the third beat since the beginning of the piece (abs_beat), it 
occurs as the third beat of the first bar, the time-signature for the whole piece 
being four quarter notes per bar (4/4); finally, this beat strikes at 0.357... seconds 
from the start. 

This Ruby scripts library allows to insert other beat tracks, currently either a 
track processed from the extraction of the double-bass line (mostly by frequency 
filtering and beat tracking) or user-made claps inserted into the rhythmic model 
of the piece. Each clap is encoded as a floating number in order to respect the 
musicians’ freedom in departing from the strict meter of the model. It is also 
possible to quantize them, in order to stick the events exactly to the metric. 
No less than nine such different beat-tracks appear as vertical colored lines on 
Fig. 4. 

A movie trailer can be viewed and heard at http://www.kbcoo.net/. 


4.4 User Interface 

In practice we have several rhythmic lines, that we can switch off or on at 
will. In the public online version 1.0, the app can provide the dancer with a 
metronomic line, or just the strong beats, or the half-beats, or the double-bass 
track, or any combination thereof. This fulfills one of our aims, allowing different 
choices on the same music for practice and pedagogical purposes. The idea of 
using the phone’s buzzer was considered but left aside, at least for the time 

11 Letting the computer believe that a waltz has a binary time signature leads to hilar¬ 
ious results. Though automatic detection of a ternary signature has been possible 
for some time (cf. [9] for instance), we found it simpler to just add the information 
manually. 
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Fig. 4. A passage of d’Arienzo’s “El Aeroplano” viewed in Sonic Visualiser 







Fig. 5. Full spectrogram for Manuel Joves’s “Loca” (Color figure online) 


being. We have preferred to focus on sonic and visual signalization. Indeed, a 
substantial part of our research was devoted to the visualization of the music, 
focusing on the perception of rhythm but also the energy variations (both in a 
physical and aesthetical sense: accents, accelerations, dramatic fermatas...) of 
a piece. Visual information is absolutely necessary for the dancer to anticipate 
was is going to happen. We tried numerous prototypes in order to visualize 
energy. Since there was too much information on a full spectrogram (see Fig. 5, 
where frequency ranges are already abstracted as superposed color bands), we 
retained a presentation in frequency bands roughly following the repartition of 
the instruments into bass, accompaniment and melody, producing a reduced 
spectrogram with three different frequency range levels coded in three colors, 
and vertical width proportional to their energy (Fig. 6). This visualization still 
appears complex at first glance, but one soon gets used to it and appreciates the 
wealth of useful information that it provides. 
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Fig. 6. Reduction of “Loca” to the energies of three spectral bandwidths (Color figure 
online) 


5 Conclusion 

Despite all its fascinating incursions into theoretical territory, Compass Trainer 
is a commercial project available for iOS and Android. This entailed some simpli¬ 
fications (mentioned above) of the data proffered to the end-user, and of course 
to beta-testing in actual Tango classes and showing the software to professional 
Tango masters. 

To sum it up, Compass Trainer appeared as a very efficient tool for its 
intended purpose, when used by an experienced teacher. Quoting the field notes 
of one of the first tests: 

This is my first experimentation with Compass Trainer , using the 
computer-generated click tracks on F. Canaro’s milonga “Silueta Portena”. 

In this class, there are three different levels of practitioners [including 
complete beginners in Milonga style]. 

After listening once to the music, dancers are invited to walk solo on it; 
most are unable to follow anything like the strong beats or Compas. The 
teacher claps her hands to help them understand when to step [or transfer 
weight on the other foot]. Then Compass Trainer is launched: 

(1) First with the Compas and media-tempo beat tracks. All students are 
now able to walk in Compass, almost instantly. 

(2) Then with the double bass beat-track. With some preliminary work (lis¬ 
tening to diverse characteristic Milonga rhythms), students quickly identify 
the typical patterns and begin to play (solo) with them. The use of this 
track is immediate and enjoyable. 

Overall, using Compass Trainer is a huge bonus: after some time dancing 
on the augmented musics, every student was able to dance several milongas 
correctly in phase with the strong beats, without even needing a dry run 
of the first bars. No need for the teacher to clap the beats anymore, freeing 
time for individual fine-tuning instead. The only drawback was that the 
will to test fully the software perturbed the flow of the class - this just 
has to be thought of ahead of time. 

We also probed professional circles - cautiously, since we wanted to establish 
cooperation, not competition, and had to make clear that home practice with 
the software could complete tango classes, not replace them. So far the maestro’s 
reactions were quite positive. 
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In this project, we studied the cutting-edge research on beat and meter detec¬ 
tion and selected what suited our specific needs, fulfilling our primary goal of 
providing the listener with a usable rendition of several layers of rhythmic content 
present in the tango music as perceived by a dancer. 1 Interestingly however, we 
did require at least one state-of-the-art module making use of pre-trained Neural 
Networks for the detection of the Compas proper, mimicking both the musical 
culture of an educated listener and the predictive hearing as triggered by the 
first beats of a piece, just as a human milonguero does, gently balancing himself 
and his partner from side to side for a few bars before embarking on the first 
step of the dance. 
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Abstract. Uniform strings have a harmonic sound; nonuniform strings 
have an inharmonic sound. This paper experiments with musical instru¬ 
ments based on nonuniform/inharmonic strings. Given a precise descrip¬ 
tion of the string, its spectrum can be calculated using standard tech¬ 
niques. Dissonance curves are used to motivate specific choices of spec¬ 
trum. A particular inharmonic string consisting of three segments (two 
equal unwound segments surrounding a thicker wound portion) is used in 
the construction of the hyperpiano. A second experiment designs a string 
with overtones that lie on steps of the 10-tone equal tempered scale. The 
strings are sampled, and digital (software) versions of the instruments 
are made available along with a call for composers interested in writing 
for these new instruments. 


1 Ideal and Non-ideal Strings 

An ideal string vibrates in a periodic fashion and the overtones of the spectrum 
are located at exact multiples of the fundamental period, as required by the one¬ 
dimensional linear wave equation [6] . When the string deviates from uniformity, 
the overtones depart from the harmonic relationship and the sound becomes 
inharmonic. There is only one way to be uniform, but there are many ways to 
be nonuniform; there is only one way to be harmonic, but many ways to be 
inharmonic. 

In a “prepared piano,” weights and other objects are placed in contact with 
the string, giving it a nonuniform density and a sound that can be described as 
bell-like, metallic, or gong-like. Such preparations tend to lack detailed control. 
Our recent work [4] explores one possible musical instrument and system, the 
hyperpiano , which is based on a particular inharmonic string consisting of three 
connected components. The design approach is illustrated in Fig. 1(a) where a 
nonuniform string is conceptualized as consisting of a sequence of connected 
segments, each of which is uniform. By carefully controlling the string segments, 
a large variety of inharmonic effects may be achieved. 

The invention of new musical instruments and ways to tune and play them 
has a long history [2,12] and continues to the present, though modern approaches 
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Fig. 1 . A nonuniform string can be thought of as a sequence of connected uniform 
strings, shown schematically in (a) with three segments characterized by their mass 
densities zq, v<i and z /3 and lengths. A typical round-wound string (b), as commonly used 
for guitar and piano strings, has a solid metal core with mass density fi The winding 
(plus inner core) may be characterized as a uniform string with mass density 


are often based on digital rather than analog sound production [11]. From an 
acoustical point of view, the idea of designing an instrument based on an inhar¬ 
monic sound contrasts with the more common approach of beginning with an 
inharmonic vibrating element and trying to make it more harmonic. 

While there is no conceptual difficulty in imagining a string with an arbitrary 
density profile, it is not easy to fabricate strings with oddly varying contours. 
For certain specific densities, and for a small number of segments, it is possible 
to exploit the structure of commonly manufactured strings. A wound string, 
as shown schematically in Fig. 1(b), consists of a core metal wire with mass 
density ^ surrounded by a second wire wrapped around the outside (shown 
with a density of /ii). Stripping away the winding from a portion of the string 
effectively creates a segmented string that is readily available from commercial 
sources, and this is how the strings of the hyperpiano are made. 

The completed hyperpiano (see Fig. 2) is discussed in some detail in Sect. 2, 
and its musical system, based on the hyperoctave, is outlined. An inherent 
problem for any new musical instrument is to find composers to write for it 
and performers to play it; we address this issue by making digital simulations 
(software-based sample playback modules) of the hyperpiano available for down¬ 
load (http://sethares.engr.wisc.edu/papers/hyperlnstruments.html) [19]. 



Fig. 2. The completed hyperpiano and a closeup of the nonuniform strings. The instru¬ 
ment can be seen and heard “in action” in Video Example 1 (http://sethares.engr.wisc. 
edu/papers/hyperOctave.html). The nonuniform strings of the hyperpiano have inhar¬ 
monic strings, the first five overtones occur at ratios (1., 1.79, 2.84, 4.02, 5.08). 
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Given an inharmonic string, one way to characterize the quality of the result¬ 
ing musical system is to draw the dissonance curve. This provides a way of locat¬ 
ing the intervals that are maximally consonant and hence provides a candidate 
tuning for the instrument. Indeed, several examples of such systems are shown 
in [16], but most are based on electronic sound synthesis. The use of inharmonic 
strings provides a physical analog. Section 3 uses the specification obtained from 
dissonance curves to design a nonuniform segmented string, and we verify that 
the overtones match the locations of 10-tet scale steps. In order to expose com¬ 
posers and musicians to this new musical system, we present the design, the 
string, and a software emulation that can be easily downloaded. 

2 The Hyperpiano 

In [4], using the techniques of [5], we simulated a large number of different 
inharmonic strings of the form of Fig. 1(a), each with three segments. Visualizing 
the dissonance curves was useful since it allowed a rapid overview of the behavior 
of a given design over all possible intervals. Eventually, we chose a particular 
design in which each string has the (unwound, wound, unwound) lengths of 
i\ = 12%, = 9.6%, and £3 = 78.4%, with densities v\ = z /3 = 0.00722 and 

v 2 — 0.0276 kg/m. These strings have the spectrum shown in Fig. 2. 

The psychoacoustic work of R. Plomp and W.J.M. Levelt [14] provides a 
basis on which to build a measure of sensory dissonance. In their experiments, 
Plomp and Levelt asked volunteers to rate the perceived dissonance or rough¬ 
ness of pairs of pure sine waves. In general, the dissonance is minimum at unity, 
increases rapidly to its maximum somewhere near one quarter of the critical 
bandwidth, and then decreases steadily back towards zero. When considering 
timbres that are more complex than pure sine waves, dissonance can be calcu¬ 
lated by summing up all the dissonances of all the partials, and weighting them 
according to their relative amplitudes. For harmonic timbres, this leads to curves 
having local minima (intervals of local maximum consonance) at small integer 
ratios (as in Fig. 3(a)), which occur near many of the steps of the 12-tone equal 
tempered scale. Similar curves can be drawn for inharmonic timbres [15], though 
the points of local consonance are generally unrelated to the steps and intervals 
of the 12 -tone equal tempered scale. 

The dissonance curve for the strings of the hyperpiano is shown in Fig. 3(b). 
This curve mimics the shape of a harmonic dissonance curve, except that it is 
stretched out over two octaves (instead of one). Thus the corresponding hyper¬ 
octave system is built on a tuning that has its unit of repetition at the double 
octave (making it analogous to the Bohlen-Pierce scale [1,8], which has its unit 
of repetition at the interval 3:1). Figure 3(b) can be thought of as a sensory 
map of the \/4 hyperoctave, which labels each hypermajor scale step between 
the unison (0 cents) and hyperoctave (2400 cents). In descending order of con¬ 
sonance, the minima formed by coinciding partials are the perfect hyperfifth, 
major hypersixth, perfect hyperfourth, major hypersecond, minor hyperthird, 
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cents cents 

(a) (b) 

Fig. 3. The dissonance curve [16] is a plot of the summed dissonances of all the sinu¬ 
soidal overtones, (a) The left curve assumes harmonic sounds with five equal partials. 
As shown in [14], such a curve has minima at many of the simple integer ratios. The 
dissonance curve for the first five partials of a hyperoctave nonuniform string are shown 
in (b). 


and major hyperthird. In comparing these tempered intervals to the nonuni¬ 
form string dissonance curve, the largest deviation is only 2 cents, suggesting an 
inharmonic analogy to just intonation. 


2.1 The Hyperoctave System 

Terhardt [18] writes “It may not only be possible but even promising to invent 
new tonal systems... based on the overtone structure.” The hyperoctave scale is 
based on the overtone structure of a sound, in this case, a string with a specific 
nonuniform geometry. Like the Bohlen-Pierce scale, it can be played on acoustic 
instruments, as demonstrated by the hyperpiano. This section delves into the 
musical possibilities of the system with a focus on its tonal possibilities. More 
specifically, the goal is to consider the hyperoctave system in terms of Cope’s 
three crucial characteristics: key, consonance and dissonance (or relaxation and 
tension), and hierarchical relationships [3]. 

As shown in Fig. 3(b), most of the scale steps fall on or near local minima, 
indicating that the hyperoctave scale consists mainly of consonances. Unfor¬ 
tunately, this also means that there is not a large degree of contrast between 
possible consonances and dissonances, and this may limit the ability to ade¬ 
quately implement Cope’s second requirement. On the other hand, there may 
be other ways to obtain harmonic tension and release patterns. 

Figure 4 outlines the available notes and the manner of notation for the hype¬ 
roctave system. The lowercase v preceding the staff designates it as a hyperstaff. 
Each string of the hyperpiano was recorded (these raw recordings are also avail¬ 
able [19]). A stand-alone sample playback module was written in Max/MSP 
[10] to make it easier for a composer to explore hyperpiano music via a MIDI 
keyboard, and the interface is shown in the top right of Fig. 4. 
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Fig. 4. (Left) A traditional grand staff displays every note in the range of the hyper¬ 
piano and the same set of pitches displayed on a grand hyperstaff via the simplified 
chromatic tone-cluster notation of Henry Cowell. The keyboard located below depicts 
how these notes are arranged in relation to a MIDI keyboard. (Top right) One of the 
software sample-playback devices available on the paper’s website [19]. This is the Max 
for Live interface which integrates easily into Ableton Live. (Bottom right) The a and 
uj sets parallel the two whole tone scales in 12-tet. 


2.2 First Compositions: Giovanni Dettori 

Italian composer Giovanni Dettori’s initial reaction to the system (evidently due 
to its overabundance of consonant intervals) was to exclaim, “Everything that 
I play sounds ‘motionless’ after a few seconds.” His observation, in this regard, 
synchronizes well with Piston’s [13] conviction that, “It cannot be too strongly 
emphasized that the essential quality of dissonance is its sense of movement.” 
But as Dettori persisted in tentative compositional exploration combined with 
attentive listening, he began to discover ways to add harmonic motion. And upon 
the completion of his first piece, Improvviso for Hyperpiano [19], Dettori wrote: 

I feel that the more I play around with hyperpiano, the more I feel com¬ 
fortable. Through chromatic writing, I don’t have that feeling of ‘static 
soundscape’ that I had at the beginning. For example, if I insist on a set 
of pitches for a while then moving the same set chromatically (upwards or 
downwards by parallel motion or both by contrary motion) I have a feeling 
of modulation, or at least of pretty strong harmonic change. Also, listening 
to my Improvviso piece I don’t feel it lacks modulation. This is to say that 
my ‘fears’ about the limitations of the system were probably more ‘cul¬ 
tural’ (related to decades of listening and writing habits) than perceptive. 

The preliminary pitch-set technique described above was not used systemati¬ 
cally, and it was still in the process of taking shape while Dettori was composing 








































































Using Inharmonic Strings in Musical Instruments 109 


his Improvviso. In fact, he later wrote, “I found traces of it after I was sketching 
the music.” The basic technique involves dividing the hyper chromatic scale into 
two transpositions of whole-hypertone scales, and normalizing one transposition 
before moving to the other transposition (to add a sense of tension) which then 
resolves back to the original normalized transposition (to add a sense of release). 
Although each transposition is composed of traditional augmented chords, due to 
the novel context and the inharmonic spectrum. Dettori continues, “they don’t 
have the dominant function we usually assign to augmented chords.” These two 
transpositions are depicted in Fig. 4. W.A. Mathieu [9] has referred to the two 
whole-tone scale transpositions in 12-tet as the a and u sets, and these names 
can also be applied to the two whole-hypertone scale transpositions. 

Of course, the Improvviso is more complex than bouncing back and forth 
between the a and u) sets; many parallel logics converge that are difficult to 
describe completely. For instance, tension can also be achieved by breaking 
an established harmonic rhythm (and so breaking psychological expectations). 
Dettori found that the pitch-set approach not only helped with inducing har¬ 
monic tension-release patterns, it also provided him with a framework of hierar¬ 
chical order and it helped him to not feel “lost” in the hyperoctave soundscape. 
To Dettori’s ear, the tonic chord consists of a pitch-set plus a specific bass note. 
For instance, a piece could be written in the key of avC, uovC , avC§, etc. In this 
notation, the a or u designates the pitch-set, while the pitch class designates 
the bass note. Taken together, these designate the key. Dettori prefers to think 
of other chords from the tonic pitch-set (i.e., with a different note in the bass) 
as relative tonic chords. Dettori perceives two dominant chords; they both are 
based on the pitch-set antithetical to the one on which the tonic is built, but 
one of them has an ascending leading tone in the bass, while the other one has a 
descending leading tone in the bass; in this respect, the leading tones are located 
a hypersemitone (i.e., 200 cents) above or below the tonic bass note. And when 
a cadence ends on a relative tonic chord, Dettori perceives less conclusiveness 
than when it ends on the tonic chord. 

Table 1 provides a summary with links to the existing repertoire of music that 
adventurous composers (to whom we are grateful) have chosen to create using 
the hyperoctave system. Together, these works provide a fairly diverse taste of 
the possibilities. 

2.3 Call for Compositions 

“After creating a new scale, how can one quickly find out what it is good for?” 
asked Mathews, Pierce, and Roberts [7]. “Are there listening tests and labora¬ 
tory studies that can precede the long slow process of trying to compose sig¬ 
nificant music with the new scale?” While it is certainly possible to try and 
“think through” many of the issues that arise with any new scale system, likely 
the answer to this question is that it takes time, and the ear needs to become 
accustomed to the new sounds. Our approach is to provide software versions of 
the hyperpiano (such as in Fig. 4) so that composers can easily play, listen, con¬ 
template, and acclimate. We would like to encourage participation in what we 
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Table 1 . Compositions and improvisations for hyperpiano 


G. Dettori 

Improvviso for Hyperpiano (http://sethares.engr.wisc.edu/Sounds/ 
hyperOctaveSongs/Improvviso.wav) 

Miniature Variations, for Hyperpiano (http://sethares.engr.wisc.edu/ 
Sounds/hyperOctaveSongs/MiniatureVariations.wav) 

P. Eisenhauer 

Arroyo (http: //sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/Arroyo, 
wav) 

H. Straub 

Gon-Tanz (http://sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/ 

Gon-Tanz.wav) 

B. Hamilton 

Hyperthing (http://sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/ 
Hyperthing.wav) 

M. Tristan 

Mandala No. 1 (http://sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/ 
MandalaNol.wav) 

Temple Bell Sketch (http://sethares.engr.wisc.edu/Sounds/hyperOctave 
Songs/TempleBellSketch.wav) 

Palimpsest (http://sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/ 
Palimpsest .wav) 

Ayutthaya Rhapsody (http://sethares.engr.wisc.edu/Sounds/hyperOctave 
Songs/AyutthayaRhapsody.wav) 

Siamese Cat (http://sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/ 
SiameseCat.wav) 

C. Devizia 

Blue Rorqual (http://sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/ 
BlueRorqual.wav) 

J.-P. Kervinen 

Ten Two-Part Hyperinventions (http://sethares.engr.wisc.edu/Sounds/ 
hyperOctaveSongs/TwoPartHyperInventions.mp4) 

Hyperinvention #2 (Glitch) (http://sethares.engr.wisc.edu/Sounds/ 
hyperOctaveSongs/HyperInventionNo2 (Glitch), wav) 

Hyperinvention #3 (Variation) (http://sethares.engr.wisc.edu/Sounds/ 
hyperOctaveSongs/HyperlnventionN o3 (Variation) .wav) 

Hyperinvention #8 (Variation) (http://sethares.engr.wisc.edu/Sounds/ 
hyperOctaveSongs/HyperlnventionN 08 (Variation) .wav) 

S. Weigel 

Gold-teased Peppermint (http://sethares.engr.wisc.edu/Sounds/hyper 
OctaveSongs/GoldTeasedPeppermint.wav) 

W.A. Sethares 

HyperScarlatti (http://sethares.engr.wisc.edu/Sounds/hyperOctave 
Songs/hyperScar latti.wav) 


believe is a rewarding compositional experience. Digital samples of every string 
of the hyperpiano, and several versions of software playback modules implement¬ 
ing MIDI versions of the hyperpiano can be found at the paper’s website (http:// 
set hares. engr. wise. edu / papers / hyperinstruments. ht ml) [19]. 

2.4 Extending the Hypersystem 

A natural extension of the hyperoctave system is to the \/4 quarter-hypertone 
scale. Figure 3 indicates the steps of the hyperchromatic scale with the solid 
red lines, while the extended notes are drawn as dashed green. Since most of the 
dashed lines occur near maxima of the dissonance curve, together these provide a 
much higher degree of contrast between consonance and dissonance. Closer exam¬ 
ination reveals three unfortunate aspects of this system. First, the hypermajor 
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Fig. 5. Two dissonance curves for the first five partials of a hyperoctave nonuniform 
string are shown. The ascending hypermajor scale contains a raised leading tone, while 
the descending hypermajor scale contains a lowered leading tone. Below the scales are 
the various modes that can be derived from each hypermajor scale. Scale steps with 
bold red borders represent either a root, third, or fifth in relation to the various triads 
of a hypermajor key. Roman numerals identify the various triad types constructed on 
the given scales. Pitch classes which are preceded with v are hypernotes, and pitch 
classes which are not preceded with v are traditional notes. All of the pitch classes are 
structured in relation to the key of vC. Red notes represent pastel tones (see text). 
(Color figure online) 


scale is built on the hyper chromatic scale which is an enharmonic equivalent of a 
traditional whole-tone scale, and each interval that the quarter-hypertone scale 
adds to the initial hyperchromatic scale ultimately forms a traditional whole- 
tone scale constructed on a different transposition. Therefore, a hypermajor key 
built on the hypermajor scale cannot exploit the newly generated dissonant 
intervals because the whole-tone scale can only be transposed two ways. Second, 
the perfect hyperfourth shares octave equivalence with the major hyperseventh, 
and thus confuses tone relations within the hypermajor scale. Finally, there is 
the problem of “the extra-wide leading tone” inherited from the hyperoctave 
system, which may cause cadences to sound less final. 

Fortunately, there is an elegant solution to each of these problems: displacing 
the major hyperseventh. If the hyperseventh is augmented to 2300 cents when 
ascending and diminished to 2100 cents when descending, the scale contains 
every possible quarter-hypertone interval (except the hypertritone) in the hyper¬ 
major key! This allows many possibilities for dissonant intervals. It eliminates 
octave equivalence between the perfect hyperfourth and the major hyperseventh. 
And, fortuitously, it also addresses the leading tone problem. 
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Fig. 6. Two keyboard layouts for the quarter-hypertone system are shown. The upper 
layout is designed for a traditional seven-plus-five keyboard, while the lower layout is 
designed for an adapted keyboard. Each layout depicts how the hyper chromatic scale 
relates to the given keyboard design. The augmented and diminished hypersevenths 
are also labeled in relation to the key of vC while the remaining extended notes are 
indicated by their gray color. The diagonal lines on the adapted keyboard represent 
locations where keys have been removed. 


A tonal map of the quarter-hypertone system is outlined in Fig. 5. The red- 
colored notes (as distinguished from the gray) share octave equivalence with 
either the root, third, or fifth of a given triad, even though they are not part of 
the delineated key. For example, the fifth of the tonic triad is vG , a pitch that 
is traditionally identified as D. But transposing the D down an octave gives 
vC \j which is foreign to the key of vC. We call such notes pastel tones ; they are 
unique to hyperoctave music, and they allow for some intriguing embellishments. 
Pastel tones should be used cautiously, however, because they may confuse tone 
relations when listened to with an ear trained in octave-based music. On the 
other hand, they may provide a degree of ambiguity and “color.” 

To aid in visualization, Fig. 6 provides two quarter-hypertone keyboard lay¬ 
outs that complement the tonal map in Fig. 5. Collectively, the tonal map and 
the keyboard layouts are intended to supply enough information to propel the 
novice into tonal quarter-hypertone composition. 

A version of the adapted keyboard in Fig. 6 was constructed by modifying 
a USB MIDI Controller (the M-Audio Keystation 88es). This controller was in 
production for many years and can generally be purchased second-hand at a rea¬ 
sonable price. The key arrangement on this controller has a reputation among 
enthusiasts as being easy to modify, and we found this to be true. Figure 7 
shows a photograph of the completed controller. The gray color on some of the 
keys was produced by spraying on coats of primer, gray spray paint, and clear 
gloss polyurethane. We believe the adapted arrangement provides a more intu¬ 
itive quarter-hypertone controller than the traditional keyboard arrangement in 
Fig. 6. 
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Fig. 7. The adapted keyboard arrangement of Fig. 6 embodied by a modified M-Audio 
Keystation 88es 


3 Inharmonic Strings for 10-Tone Equal Temperament 

In the familiar 12-tone equal temperament, the octave is divided into 12 equal¬ 
sounding semitones, which are in turn divided into 100 barely perceptible cents. 
Instead, 10-tet divides the octave into ten equal sounding pieces. Yet, from the 
orchestra to the radio, Western music overwhelmingly favors 12-tet while the 
existence of 10-tet is comparatively unknown. There may be an underlying reason 
for this lack—that harmonic tones sound out-of-tune (or dissonant) when played 
in 10-tet. For instance, the closest 10-tet interval to a musical fifth is 720 cents, 
as opposed to the 12-tet perfect fifth of 700 cents. The 10-tet fifth is likely to 
be heard as a sharp, out-of-tune 12-tet fifth. A full major chord is even worse. 
The problem is not simply that harmonic sounds are dissonant in 10-tet. In 
tonal music, the motion from consonance to dissonance (and back again) plays 
an important role. Thus the fact that most intervals in 10-tet are dissonant 
when using harmonic sounds makes it almost impossible to achieve the kinds of 
contrasts needed for tonal motion. 

Using the ideas of [16], it is straightforward to design spectra for sounds that 
will appear consonant at the 10-tet intervals. Let r = \/2 and consider a sound 
with its first six overtones at /* = {/, r 10 /, r 16 /, r 20 /, r 23 /, r 26 /}. The “principle 
of coinciding partials” suggests that such a spectrum should have a dissonance 
curve with minima at many of the ratios of these partials. All of these ratios 
are integer powers of r, and hence form intervals that lie on steps of the 10-tet 
scale. Thus intervals such as the 720-cent “fifth” and the 480-cent “fourth” need 
not sound dissonant and out-of-tune when played with sounds that have this 
spectrum (even though they appear very out-of-tune when played with normal 
harmonic sounds). 

While it may be straightforward to create electronic simulations of sounds 
such as /* [17], it is less obvious how to create such sounds acoustically. This 
can be formulated mathematically as an optimization problem by assuming that 
an inharmonic string consists of n segments, each with length ^ and mass 
density fii. The spectrum of that string will have overtones at /(£, /i) where 
i = {^i,^ 2 5 • • • An] and /i = {/ii,/i 2 ? • • • , /i n }- Then the goal is to minimize 

J = min ||/* — (1) 

1,/J, 
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Fig. 8. The 10-tet string, its dissonance curve, a possible keyboard layout, and an 
overview of possible tonal functions in the 10-tet system. 


in some appropriate norm. The optimization problem (1) can be solved using a 
gradient descent method 

d 7 

+ = ( 2 ) 

31 

IU + 1 ) = 

where /i(j) and £{j) are the values of the densities and lengths at iteration j, and 
where a M and ai are the algorithm stepsizes (which may be different because 
the units of fi and £ are different). 

For 10-tet sounds, f* = /o{l, 2,3.0314,4,4.9246, 6.0629}. Using n = 5 seg¬ 
ments (with the constraint that there are only two different densities, for ease of 
construction), the optimal solution to (1) is shown in Fig. 8. We built the string 
(using the same technique of unwinding the wound portions as in [4]), sam¬ 
pled the sound, and calculated the Fourier transform to verify that the desired 
spectrum was achieved (the sum of absolute errors over all partials was 0.04 
percent). 

As with the hyperpiano in Fig. 4, we created a software sample play¬ 
back module in order to enable composers and musicians to explore the 
instrument. An immediate response from Carlos Devizia called Ants at the 
Office (http://sethares.engr.wisc.edu/Sounds/hyperOctaveSongs/AntsAtTheOf 
fice.wav) demonstrates one musical possibility. Marcus Tristan is composing an 
ensemble piece, Circles of Celestial Light, using the 10-tet string-based sam¬ 
pler as an “electro-acoustic” layer. Giuseppe Testa has composed two studies 
using the 10-tet string samples called Moten (http://sethares.engr.wisc.edu/ 
Sounds/hyperOctaveSongs/Moten.wav) and Vinby (http://sethares.engr.wisc. 
edu/Sounds/hyperOctaveSongs/Vinby.wav). Figure 9 shows a possible cadence 
performed with this 10-tet software module in order to demonstrate the possi¬ 
bility of tonal music in 10-tet. 
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Fig. 9. A cadence in 10-tet demonstrates how the 10-tet inharmonic string may be 
capable of supporting tonal structures. This cadence is played using the 10-tet string 
at the paper’s website [19]. 


4 Conclusions and Acknowledgements 

This paper has presented an extended analysis of the hyperoctave system, which 
is based on an inharmonic (nonuniform) string that forms the basis of the hyper¬ 
piano. But there are myriad possible inharmonic strings, and a design for a 10- 
tet string provides one example. We invite everyone to use our designs, playback 
modules, software, and strings to investigate inharmonic musical realms such 
as the hyperoctave and 10-tet. These and other resources may be found at the 
papers website [19]. 

The authors would like to thank the composers who have worked with the 
hyperpiano including M. Tristan, C. Devizia, P. Eisenhauer, B. Hamilton, J.-P. 
Kervinen, H. Straub, and S. Weigel. We would like to especially thank Giovanni 
Dettori for his compositions and for sharing his experiences and thoughts over 
the course of a long thread of emails. 
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Abstract. Comprovisador is a real-time, networked system through which a 
conductor/composer mediates the interaction between a solo improviser and an 
ensemble of musicians who sight-read an animated score. The system uses 
multiple computers - one host and several clients - to perform algorithmic 
compositional procedures with the musical material improvised by the soloist 
and to coordinate the response of the ensemble. The present paper focuses on 
main aspects of the compositional algorithms used, overviewing the concept and 
structure of this system as well as describing the main features of its notation 
interface. Some of the real-world opportunities for development and testing that 
have occurred are also reported. 

Keywords: Musical improvisation • Algorithmic composition 
Dynamic notation • Network musical performance • Graphical interface 


1 Introduction 

The development of Comprovisador aspired to join the broad concepts of improvisa- 
tion and composition in a real-time environment, through machine listening, algo¬ 
rithmic procedures and dynamic notation. The goal was to enable soloist-ensemble 
interaction expressed as a coordinated (composed) ensemble response to an improvi¬ 
sation. Efforts were made to allow the listener to perceive the composed material as 
being originated in the soloist’s improvisation. Additional levels of interactivity were 
envisaged through a feedback loop - the soloist’s reaction to the ensemble’s response - 
and through mediation - which consists in the manipulation of algorithmic parameters. 
The system was designed to be flexible regarding instrumentation, accepting impro¬ 
visers from different backgrounds and classically trained ensemble members. 

1.1 Concept 

In broad terms, Comprovisador is able to listen to an improvisation, decoding pitches, 
intervals and durations, and to facilitate the creation of different musical responses 
through algorithmic compositional procedures. Control of these procedures is 
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performed in real-time, from a hardware terminal. The outcome of such procedures is 
displayed to players in the form of an animated staff-like score, viewed in a computer 
screen. Through wireless connectivity, it is possible to place the computers - and, 
therefore, the musicians - apart from each other, allowing non-standard settings. 

1.2 Background 

This system can be framed in four different areas of computer use in musical creation: 
(1) computer-assisted composition, (2) improvised music with human-machine inter¬ 
action, (3) dynamic musical notation and (4) networked music performance. 

Computer-assisted composition (CAC) was bom with the creation of the work 
“Illiac Suite”, by Lejaren Hiller, in 1957 [6]. It consists of a compositional practice that 
uses algorithmic procedures performed by a computer, typically in deferred-time. 

As an example of a human-machine interaction system, we can point out a project 
entitled “OMax”, carried out by the research team “Musical Representations” of 
IRC AM [4]. “OMax” consists of a computer program capable of learning, in real-time, 
the typical characteristics of a musician’s improvisational style, as well as to play with 
him, in an interactive way. The main difference between Comprovisador and most 
systems designed for improvised interactive music performance lies in the type of 
output: in most of these systems, the computer interacts directly with the musician, 
outputting electroacoustic sounds either synthesized or sampled; in the case of Com¬ 
provisador, the computer coordinates the musical response by an ensemble of musi¬ 
cians who sight-read a generated score. 

Since the late 90’s, dynamic musical notation has been increasingly used in 
algorithmic real-time music systems enabling various kinds of new interactive features 
such as audience participation, allowing it to influence the behavior of the algorithms 
[8]. It has been increasingly used as a result of recent technological developments 
which facilitate its implementation, such as tablets, laptop computers and video pro¬ 
jectors. Also, regarding software, advancements have been made which allow the use 
of staff-like notation in real-time applications. Among these we find MaxScore [10], 
INScore [7] and the library used in Comprovisador: Bach [2]. Still, many approaches to 
dynamic notation tend to use animated graphic scores [8, 12, 15] and other kinds of 
non-staff notation - for example, Jason Freeman used colored LED light tubes to 
convey pitch and dynamics information to performers, in his work “Glimmer” [8]. 
Such approaches have a visual level that can be in itself an aesthetic goal, since it is 
common to have the animated notation projected for audiences to see. Also, these 
approaches rely on the improvisational skills of all performers to make their own 
interpretation of the score whereas Comprovisador - apart from the soloist (or soloists) 
- requires more traditional sight-reading skills from the ensemble performers, since it’s 
based on staff-like notation. Nonetheless, in certain situations, ensemble members may 
be required to improvise through the use of textual instructions. This concertino-ripieno 
kind of function separation - one of the key aspects of Comprovisador - seems to be 
uncommon among other real-time notation work. 
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Networked music performance is another practice that has emerged in recent 
decades thanks to development of computer network technologies and the creativity of 
musicians [11]. It consists on performance situations where a group of musicians 
interact over a local or wide area network (LAN or WAN). This interaction can be 
achieved, for example, by audio streaming, score rendering or strategies for graphical 
direction. In our case, audio streaming is not presently used (only LAN performances 
thus far). Among systems that use graphical direction strategies, we find Decibel 
ScorePlayer, Quintet.net and MaxScore [13]. In ScorePlayer, the main strategy consists 
on scrolling the score from right to left under a fixed vertical line, while the other two 
systems feature a fixed score and a cursor which moves horizontally. Both strategies 
were adopted by Comprovisador but the former was abandoned at an early stage. 
Instead, a new strategy was developed in which a bouncing ball is responsible for 
synchronizing attacks and/or conveying a pulse (see Sect. 4). Programming of the 
bouncing ball incorporates motion laws that convincingly translate arsis and thesis 
sensations. 

On an aesthetic level, besides the four areas mentioned above, the development of 
Comprovisador has drawn inspiration from gestural languages for real-time composi¬ 
tion and conducted group improvisation such as Walter Thompson’s “Soundpainting” 
[17] and Lawrence Morris’s “Conduction” [16]. In fact, the author’s personal experi¬ 
ence as a performer in this field has motivated the conceptualization of some of the 
system’s features. 

2 Performances 

Since 2015, Comprovisador has been used in public performances in five different 
occasions. Each performance has been preceded by development stages and short 
periods of rehearsal. In “Comprovisagao n° 5” 1 , which took place in the foyer of the 
Lisbon College of Music (ESML), in January 2017, the rehearsal stage spanned over a 
four-month period of weekly rehearsals with an ensemble consisting of 12 ESML 
students (flute, oboe, alto saxophone, tenor trombone, bass trombone, tuba, electric 
bass, marimba, piano, two singers - soprano and mezzo - and violin) and served as a 
test field for ongoing development. 

Hence, every public performance has served as a developmental milestone. For 
example, in “Comprovisagao n° 4” (see Fig. 1), harmonic as well as multi-percussion 
specific notation were introduced, while “Comprovisagao n° 5” featured singers (with 
real-time generated lyrics), multiple soloists and a non-standard setting (with musicians 
playing over 50 m apart from each other), among many system improvements. 


1 The performance video of “Comprovisagao n° 5” is accessible through the following URL: https:// 
youtu.be/rXNTrNzN5z0. 
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Fig. 1. Performance of “Comprovisagao n° 4”, during the SMC2016, Hamburg, 01/09/2016; 
Joao Barradas - MIDI accordion | [author] - Interface | Radar Ensemble. 


3 System Structure 

3.1 Hardware 

To be fully operational, Comprovisador needs the following hardware equipment (see 
Fig. 2 (left)): 

® a [number of] microphone(s) - to capture the improvisation of the soloist(s) 
(only necessary for non-MIDI instruments); 

© an audio/MIDI interface - to convert the analog signal of the microphone(s) 
into digital signal and/or to input raw MIDI data; 

(D a host computer - which receives the digital audio signal and/or MIDI data 
and where algorithmic procedures take place; 

© a control surface 2 - in which the algorithm parameters are manipulated; 




Router 


Fig. 2. Comprovisador: hardware setup (left); host application overview (right). 


Currently, the system is optimized to operate with the Novation Launch Control XL. 
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(D a wireless router - which establishes communication between computers; 
and 

(D a number of client computers - to render and display the animated score to 
the musicians in the ensemble. 

Typically, one client computer is used for every two performers. In some cases, 
though, it is convenient to use one computer for each performer: for instance, with large 
and/or multi-staff instruments like the piano. For intonation purposes, there is a feature 
for singers that enables them cue sounds through a set of earphones. This feature also 
requires one computer for every singer. 

The system is fully reconfigurable regarding instrumentation of the ensemble, in 
regards to number, transposition and range, as will be seen in Sect. 3.2. According to 
our testing during rehearsals, it is compatible with Mac OSX and Windows systems. 
Intel Core i5 processors (or better) are recommended. 

3.2 Software 

Software for this system is being developed in Max 7 [14], with extensive use of Bach 
library [2] for its notation features, CAC tools and Max integration. The system con¬ 
sists of two applications: one which runs on the host computer and another which is 
instantiated on each of the client computers. 

The host application is responsible for receiving and analyzing the input from the 
soloist(s), calculating the compositional procedures and responding to commands from 
the conductor/composer. The client application is in charge of rendering the generated 
score and displaying it to the musicians. 

The host application consists of multiple modules (see Fig. 2 (right)), namely: 

• pitch tracker - here, the musical notes played by the soloist are deciphered in 
real-time from the digital audio signal input; the object sigmund~ [3], designed 
by Miller Puckette, is at the heart of this module; 

• MIDI parser - in the case of MIDI enabled instruments, a MIDI parsing module is 
used instead of the pitch tracker; polyphonic and multi-channel input is accepted; 

• control interface - this module consists of two control groups containing a total of 
four slots for algorithms; algorithmic parameters are manipulated in real-time by the 
performance conductor/composer; the control interface provides graphical feedback 
for all commands performed on the external control surface (mirroring) and it is 
possible to store and recall parameter presets; it also provides information to its 
operator about ongoing algorithmic procedures (see Fig. 3); moreover, it features an 
instant message system (not visible in Fig. 3) through which messages (both pre¬ 
defined and written on-the-fly) can be sent to the players; 

• compositional algorithms - there are two distinct algorithms - Harmony and 
Contour - instantiated in all four slots of the control groups (each slot can host any 
of the two algorithms); instruments can be assigned to any of these four instances, 
which work in parallel; each algorithm generates different musical responses 
(broadly, chords and melodic contours) when receiving pitch data from the pitch 
tracker or the MIDI parser and parametric data from the control interface; fur¬ 
thermore, each algorithm has two main variations; the generated musical responses 
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take into account idiomatic aspects of the assigned instruments such as range (and 
whether it is dependent on dynamics), polyphonic capabilities, etc.; 

• communication port - generated musical data are sent via UDP or TCP protocols 
to client computers; data are rendered into musical notation in every client appli¬ 
cation - notation interface; 

Many aspects of the host application - communication port as well as parts of 
algorithms and control interface - are automatically configured on startup. This is done 
by means of a script that looks up an instrumentation list in crossed-reference to an 
instrumentation dictionary, both stored in text files. The former consists of a simple list 
of the instruments to be used in a session while the latter contains a large set of 
information specific to each instrument (family, range, transposition, clef, dynamic 
range mapping, strings tuning, initial IP port number, etc.). 



Fig. 3. Comprovisador.host: control interface. 

4 Notation Interface 

4.1 Overview 

The notation interface was conceived in order to have one client computer for every 
two instruments, regardless of range or transposition of the instruments used. In some 
cases, it is preferable to use a single instrument per computer configuration. 

Graphical objects of the interface adjust perfectly to every modem laptop computer 
screen, independently of the configuration used. This is achieved using JavaScript 
inside Max 7 to instantiate and position all graphical objects. 

Comparing layouts of the two different configurations (see Fig. 4), we see that there 
are some advantages in single instmment layout. On one hand, multi-staves can be 
used, whereas on the other, both dynamics (under the staff) and direction (over the 
staff) bars can assume larger dimensions, ensuring a faster information detectability and 
better legibility, these being good principles of graphical interface design [1]. This 
space optimization was motivated by musicians’ suggestions and was found to have a 
positive impact in performance. 
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Fig. 4. Comprovisador.client: dual-instrument layout (left) vs. single-instrument layout (right). 
(Color figure online) 

The notation objects consist on a combination of bach. roll and bach. score 
objects. While the former renders proportional durations, the latter renders standard 
rhythmic notation [2]. 

The dynamics bar is a colored bar over which the dynamics text is displayed at its 
center. Again, regarding good principles of graphical interface design, both background 
color and text size (3D space) change accordingly to the level of dynamics, in a reactive 
fashion. The color that symbolizes pppp is cyan and the one attributed tojgjjf is red. Any 
level in between will assume a proportional mixture of the two colors, maintaining the 
same perceived level of brightness. 



Fig. 5. Comprovisador.client: dynamics bar - text size (3D space) and background color. (Color 
figure online) 

Concerning text, whenever the level is being changed, the words cresc. or dim . 
appear and move forward or backward in a three-dimensional space (see Fig. 5). This 
feature is achieved by the use of OpenGL graphics rendering, using the Jitter object 
j it. gl. text3d [14]. 

These reactive features were highly valued by musicians who reported to being able 
to easily identify the dynamic level while keeping full focus on the musical notes. 

Regarding the direction bar, it also features OpenGL graphics in order to render at a 
high frame rate (around 60 fps) a small bouncing ball which allows musicians to 
synchronize attacks and play in a given tempo. 

Using the same OpenGL context, musical direction terms and other information are 
displayed. To ensure detectability, each time a new entry is displayed, it pulsates in 
bright white. 

As demonstrated in Fig. 6, the motion described by the ball derives from a sine 
function with its output converted to absolute values. This has been tested against other 
synchronization strategies but musicians had a better response to the bouncing ball 
approach. Also, the fading trail has proven to have a positive impact in motion 
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Fig. 6. Comprovisador.client: direction bar (left) vs. “folded” sine wave (right). (Color figure 
online) 


perception. Testers reported it was easier to perceive the bouncing motion even when 

not looking directly at the object. 

4.2 Reading Modes 

There are four different reading modes or directives, corresponding to the two varia¬ 
tions of each of the algorithms. The modes and their characteristics are: 

In Sync with Green Ball - Harmony, Variation 1 

• proportional notation; 

• notes are written in real-time, from left to right, as they are output by the host; 

• when a note’s duration line stretches off the play region (fixed darker rectangular 
area which represents a domain of 5 s), reappears at the beginning of the same play 
region (see Fig. 4 (right)) - this feature replaces traditional page turning; 

• likewise, notes written near the right border of the region are instantly duplicated 
near its left border (see Fig. 7 (left), notes E and C#); 

• a reading time window of about a second and a half in duration is calculated so that 
the player has time to read and prepare each note on their instrument; 



Fig. 7. Comprovisador.client: orange ball (passive gesture) and grid (underlying tempo) (left); 
Quantum Loop (right). (Color figure online) 
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• notes that have already been played are erased in order to free staff space for new 
notes to be written; 

• the player should begin each sound precisely when the ball aligns vertically with the 
note and changes direction (i.e. when it “hits” the note); 

• if the note is a long one, the ball will move horizontally over the note’s duration line 
and stop at its end, disappearing, unless a new note is to be played right after it, in 
which case the ball bounces again, even if the previous note is still active; 

In Sync with Green Ball (Grid) - Harmony, Variation 2 

• the same as explained above except for the fact that there is an underlying 
metronomic tempo for all attacks; 

• a grid representing the underlying tempo is shown in the staff (see Figs. 6 (left) and 
7 (left)); 

• during long notes or rests, instead of moving horizontally or disappearing, the ball 
continues to bounce in tempo assuming instead an orange color, while the vertical 
amplitude of its movement is reduced - thus, simulating a conductor’s passive 
gesture of tempo keeping [9] (see Fig. 7 (left)); whenever a response from the 
player is demanded (active gesture) the ball turns becomes green again and bounces 
higher; 

Loop (Non-Sync) - Contour, Variation 1 


• proportional notation; 

• a melodic contour appears, all notes at once; 

• the player should loop through the notes framed inside the play region which in this 
case can be dynamically adjusted to any arbitrary portion of the displayed melody 
(refer to Fig. 4 (left)); 

• a vertical green line (play line) cycles through the play region so to give the player 
an idea of the intended playing rate, although to synchronize with the line is not 
mandatory; 

• above all, the player should not attempt to synchronize with their fellow musicians; 
in fact, there is an intended rate discrepancy in each client’s play line in order to 
help avoid synchronicity between players; 

Quantum Loop - Contour, Variation 2 


• standard rhythmic notation; 

• a quantized melodic contour appears, all notes at once, fitted in two 4/4 measures 
(see Fig. 7 right); 

• the player should play in tempo with the green ball, which in this case aims at the 
beginning of every beat of a measure (instead of at every note); 
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• instruments assigned to the same control group (see Sect. 3.2) will always be in the 
same beat of the same measure and in the same tempo - hence they should play in 
sync; 

• when two instruments are assigned to different control groups, one of three pos¬ 
sibilities may occur: (1) both instruments are in sync; (2) both instruments are in the 
same tempo but in different positions within the loop (which may differ in size) or 
(3) the two instruments are in different tempi. 

Besides graphical rendering of the notation interface, the client application also 
carries out some algorithmic tasks that could in theory be performed by the host 
application. Examples of such tasks include the quantization used in the Quantum Loop 
mode and the transposition necessary to all transposing instruments. The goal of this 
task decentralization is to unburden the host computer’s CPU and to keep the wireless 
data traffic as lightweight as possible. 

5 Algorithms 

5.1 Harmony 

Generally speaking, Harmony generates chords from the notes played by the soloist in 
his or her last musical phrase. The notes of the generated chords are automatically 
distributed to the assigned instruments and are written in their respective notation 
interfaces, under the reading directive “in sync with green ball” (grid or no grid, 
depending on the algorithm’s variation - see Sect. 4.2). 

There are two approaches on how the soloist’s notes are selected and then 
recombined to generate chords. If the positiveHarm button is set via the control 
interface (see Fig. 8), new chords will be generated from notes that were recently 
played by the soloist. On the contrary, if negativeHarm is chosen, new chords will be 
generated from notes that were recently avoided by the soloist. 



Fig. 8. Comprovisador.host: control interface; detail of the control surface mirror block - faders 
and buttons. 
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When building a chord, three different transposition modes may be used: (1) notes 
may be transposed one or more octaves in order to fit the range of the assigned 
instrument (button equiv8a set), (2) notes will never be transposed, meaning if a note 
does not fit the range of the assigned instrument, it is filtered (button registoFixo set) 
and (3) the default transposition mode (no button set). The default mode consists in 
finding the modulus of the soloist’s phrase from its extreme notes. Thus, all transpo¬ 
sitions obey to the latest found modulus. 



Fig. 9. Comprovisador.host: control interface; detail of the control surface mirror block - knobs. 

In order to obtain a smooth voice leading between chords, transposition of a note 
for a given instrument will always take into account the register of the previous note 
played by the same instrument. To ensure variety, it is possible to perform sudden 
changes in global register. This can be done manually by flipping the register knob 
(see Fig. 9) or automatically in reaction to the soloist’s last played pitches, after a set 
time threshold (knob transfmRate). 




I 

■ 



Fig. 10. Comprovisador.host: control interface - detail of the block concerning probability 
weights for musical durations and lyrics text (left); detail of the instrument assignment block and 
reference score (right). 
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Switching between the two variations of Harmony is done by a toggle. When it is 
set to threshRhythm, new chords are automatically triggered at the end of a phrase, 
after a threshold set with the fader agogics. When it is in metroRhythm state a 
metronome is turned on forcing all attacks to adhere to it. Durations are then calculated 
according to an editable set of probability weights (see Fig. 10 (left)). The rate of the 
metronome is set with the fader agogics, as well. 

5.2 Contour 

The algorithm Contour captures the last musical phrase of the soloist and, after pro¬ 
cedures of truncation, filtration and transposition have taken place, writes the phrase in 
the notation interfaces of the assigned instruments. The written phrase is to be played in 
loop (non-sync - see Sect. 4.2) and may undergo transformations, such as contraction 
or expansion, after a few loop iterations. These transformations are always consequent 
to whatever the soloist meantime played. 

The default modular transposition approach is also used in the Contour algorithm, 
when generating a new phrase or transforming an existing one. 

Variation 2 of this algorithm (triggered by the quantizeOn toggle) consists of the 
quantization of the phrases, which allows musicians to play in sync. If a given phrase is 
quantized, its original notes remain the same (players can thus focus entirely on the 
rhythm, as they are familiar with the notes already). Durations are fitted in two 4/4 
measures using rhythms of relatively low complexity, derived from sixteenth notes and 
eighth note triplets with occasional grace notes (see Fig. 7 (right)). Further melodic 
transformations may continue to occur in this variation after a few loop iterations. 

In both variations, the fader agogics sets the loop rate (playing speed/ tempo). 
Furthermore, there is a way of synchronizing tempo between all algorithms (with 
metroRhythm and quantizeOn variations active) and a tap-tempo function. 

In the outlined quantization process, it is important to note that while the rhythmic 
quantities of the original phrase are to some degree preserved, the rhythmic qualities [5] 
may end up fairly distorted. This is because tempo information is not taken into account 
when capturing the original phrase. Rather than a problem, this is an aesthetic choice. 

In Fig. 9, there are two knobs worth mentioning: loop_start and loop_end. These 
knobs provide an easy way to manipulate the play region of the loop, used by Contour. 
Another way to manipulate it is to directly select with the mouse the region in the 
reference score shown in Fig. 10 (right). 

5.3 Global Parameters and Control Groups 

Besides the two main variations of each algorithm, there are several parameters that can 
be manipulated in real-time which result in diverse musical outcomes. Every con¬ 
trollable parameter can be stored in a preset which can later be recalled by the push of a 
single button, enabling a wide range of contrast levels in musical transitions along with 
a firm control over musical form. 

Regarding the configuration of our control surface (see Sect. 3.1) it was necessary 
to come up with a layout that would be both practical and efficient. This layout would 
have to provide for a balanced and intuitive way to control expressive parameters in 
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any performing context, considering all possible combinations of active algorithms. 
Since the control surface has 8 well-sized faders (which are ideal to operate gradual 
transitions between parameter states) and 16 conveniently placed buttons (which are 
perfect for alternating between states or triggering transformations) (see Fig. 8), the 
solution for this was to assign half of those controllers to control group 1 and the other 
half to control group 2. Therefore, faders 1 to 4 (left-hand side) control parameters 
named dynamics, agogics, articulation and density of group 1 while faders 5 to 8 
(right-hand side) control the same named parameters of group 2. 

It is important to point out that each group may control two different algorithms at 
the same time (hence the efficiency of this layout), although it is possible (and maybe 
even sensible) to activate only one of them at a time in each group. It should also be 
noted that musical parameters of different algorithms are in fact different procedural 
parameters and may behave slightly differently, despite having the same name. That 
being said, dynamics will always be dynamics, articulation will always control the 
relative length of notes, and density will always control the ratio of assigned instruments 
that will actually play. Agogics fader functionality is explained in Sects. 5.1 and 5.2. 

Other graphical objects in the interface which are seen in Fig. 10 (left and right) 
facilitate the control over instrument assignment, musical durations (for Harmony’s 
metroRhythm variation), and lyrics for singers. Most of these objects (Max objects 
nodes and multislider [14]) are used to control probability weights. 

6 Discussion 

A real-time notation system relying mainly on staff notation involves a considerable 
amount of failure expectation. Sight-reading is a difficult task and errors of pitch and 
timing are bound to occur. Thus, such a system must consider incorporating this failure 
factor as part of its aesthetics. For example, Nick Didkovsky’s work “Zero Waste”, for 
piano and real-time transcription algorithm, takes advantage of the failure expectation, 
using the mistakes of the performer and the limitations of the transcription algorithm to 
allow the music to evolve [8]. 

In Comprovisador, incorporation of the failure factor is done in several ways. In the 
algorithm Harmony, a reading time window is always present. In most cases, it allows 
the player enough time to read and prepare the note in his or her instrument, although 
this depends highly on many factors, ranging from note rate to type of instrument to 
performer’s experience. The aesthetic potential of the reading time window is expressed 
when the soloist is able to predict the delay of the response and interact with it. 

The first variation of algorithm Contour requires musicians to avoid synchronizing 
with each other. This is intended as a means to allow a relaxed approach by the 
sight-reader to the displayed melody, while creating a potentially interesting hetero- 
phonic texture. When activating the quantized variation, the notes of the original 
melody remain the same (see Sect. 5.2). This gives the performer the opportunity to 
focus solely on the new rhythm. From the listener’s standpoint, we find it is interesting 
to perceive the transition from non-sync to synchronized and vice-versa. 

The different approaches to note selection and transposition in chord generating 
(Harmony) have proven useful in creating coherent and contrasting harmonic fields. 
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In particular, the default modular transposition mode enables a kind of symmetry 
mixed with unexpectedness which we find appealing. On the other hand, the fixed 
register mode (no transposition) seems to be ideal to respond to soloists who play 
harmonic instruments (depending on their style of playing), since it provides a very 
coherent harmonic field - one could say reverberant or even belonging to the realm of 
real-time orchestration. 

Using both control groups (four slots in total), any combination of the two algo¬ 
rithms may be activated at the same time, assigned to different instrumental groups and 
performing different responses which complement each other. As a simple example, 
some instruments may play a soft drone with manipulation in dynamics while others 
skim across a high-pitched melodic fragment with agogics mediation. This has the 
potential to create very interesting musical structures, especially if planned in advance 
with the help of the preset manager for mastering musical form. 

Furthermore, the messenger system can be used to quickly and effortlessly send 
instructions to a group of musicians. For example, with two clicks it is possible to send 
the following: to all brass > multiphonics. Simple instructions such as these, sent 
simultaneously to a specified group of instruments, have a very powerful effect: the 
audience clearly perceives the coordinated response - and the soloist as well, thus, 
fostering the feedback loop. 

The final version of the bouncing ball synchronization mechanism however is yet to 
be tested in a performance situation. The previous version used MGraphics system 
(JavaScript) which did not perform at a desirable frame rate. Adding to network latency 
and sight-reading related failure, this mechanism has not yet proved to be as effective as 
we have hoped. Testers who have compared both versions reported an improvement 
which we hope to assess in future performances. 

7 Conclusions and Future Plans 

We have been developing Comprovisador since 2015 and we are glad to begin to see in 
it signs of maturity. The performances that were carried out and the rehearsal stages 
that preceded them - in particular the latest stage which spanned over a four-month 
period - were crucial for identifying and correcting problems both on the technical and 
musical side. Every aspect discussed here has been tested and proven to work in a 
reliable fashion with the hardware available and with college level musicians. 

In the short term, we plan to implement a data sequencer module which will make it 
possible to record all musical data generated during a performance. This will enable us 
an import tool for analysis. Also, in a performing context, this will allow for the use of 
recurrence by playing back passages that were recorded during the same performance. 
In the same line of thought, it will be possible to use precomposed material, based on 
recordings made in previous sessions. Such material may be practiced beforehand by 
the musicians, bringing a different aesthetic approach to the original concept, closer to 
open-form compositions. Nonetheless, the material will still be algorithmically com¬ 
posed from an improvised source and displayed as real-time notation. 
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Abstract. It would seem that the notion of musical inversion is one 
of the simplest and least mysterious: they are just run-of-the-mill sym¬ 
metries around axes. However, much depends on the context and even 
more on the model wherein inversions are used. For instance in neo- 
Riemannian theory, one talks of the local inversion R - turning a triad 
into its relative -, though its actual effect on pitch-classes depends on 
which triad R is applied to: the connection with inversions in the circle of 
pcs is tenuous at best. Other models turn R into a global operation, but 
at the cost of the essential relation R 2 = Id , while still other contexts 
enable to embed operations on points into the more general operations 
on (most) pc-sets, in a natural and visual way. This paper purports to 
synthesize the most important situations and help understand and/or 
picture what an inversion really is, in its full complexity. 


Keywords: Inversion • Local symmetry • Tonnetz • T/I • Homometry 
Spectral units • Torus of phases 


1 Inversions on Circle and Tonnetz 

Though inversions can be, and are, used on the whole line of pitches, the present 
paper will focus on pitch-classes modulo octave: pcs are modeled as integers 
modulo 12 and chords, scales, collections of notes as subsets of the cyclic set 
Zi 2 - Most considerations throughout this paper can be applied to the more 
general Z n . 

1.1 The Simplest Model 

The reader is assumed to be familiar with the Ik operators defined as 

Ik(x) = k — x (mod 12) if = Ik o Ik = Id. 

These operations generate the dihedral group 1 V 12 = T/I where the product of 
two inversions is a transposition (mathematically speaking, a translation): 

h 0 h(x) = k - (£ - x) = x + (k - i) = T k ~i(x ) 

and the (maximal abelian) subgroup T of all transpositions is normal. 
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Fig. 1. Action of I 7 on a pitch-class/on a triad 


As often, simplest may be best and indeed inversions are easily visualized on 
the cyclic (or Kremer) model of pcs. 

On Fig. 1 one can see the inversion I 7 and its axis 1 2 ; of course, inverting a 
whole collection of notes - here a triad - involves two pictures or some more 
complicated convention. This is hardly seen as a nuisance, since it appears to be 
unavoidable. However, other models like the dual Tonnetz below (or orbifolds, see 
[15] and Sect. 4) feature whole collections (pc-sets) as single points, suggesting 
alternative solutions. 


1.2 The Tonnetz, Its Dual, Its Group 

Of all models purporting to study relationships between similar chords/pc-sets, 
one of the most popular is the neo-Riemannian Tonnetz. On the original Tonnetz 
Fig. 2 , points are pcs aligned in fifth and third order. Hence the triangles are all of 
major and minor triads. A symmetry around a side of one such triangle exchanges 
it for one of its three neighbors; these symmetries are the three fundamental neo- 
Riemannian inversions L, P, P where P, for instance, switches X minor and X 
major by moving the mediant. 

In the context of this paper it is better to look up the dual Tonnetz Fig. 3, 
whose vertices are the triads themselves and the edges are their common pcs. 

The Tonnetz has proved its worth as a powerful tool in analysis of actual music, 
both in describing paths of chords (or tonalities) and in encoding chord transitions. 
Of course the main drawback is that the PLR operations are, albeit inversions, 
not constant inversions as in the T/I group: for instance P, when considered for 
C major (or (0 4 7)), is the inversion L 4 , turning C maj into A min (or (4 0 9)); 
alternatively, R in the context of G maj is In . In general, R is the inversion indexed 
by the sum of the pcs in the major third of the triad (similar rules apply to L and 
P): there is a localization operator which enables to identify which inversion on pcs 

1 Some authors call it D 24 , pinpointing its cardinality. 

2 Already the circle modelization induces this side-effect, that a center of symmetry 
is turned into an axis: equation Ik(x) = x has two solutions not one (if k is odd, the 
fixed points are half-integers). 
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Fig. 2. The Tonnetz, a modern view (Wikipedia image) 
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Fig. 3. The Dual Tonnetz 


must be used on a given triad. 3 Mathematically, the group T/I cannot be mapped 
in a natural way (pc-preserving) into the group generated by the PLR operations. 4 
However, this is indeed a group and - somewhat surprisingly - the PLR group is 
isomorphic to T/I, it is a dihedral group. 5 



3 Here R , when applied to Xm a j = k + (0,4, 7) or = k + (0, 3, 7), is 7^+4 in the 

major case, and Iio+k in the minor case. Such maps from a local structure - say a 
manifold - into the linear group of its tangent space - here, its isometries - appear 
in other domains, theoretical physics of Fields, or pre-sheafs in Category theory. The 
latter have already been applied to Music Theory in [11], of course. 

4 Here we compare PLR with the left-action of T/I on triads, i.e. the image of a triad is 
the triad of the images of its elements. See [12] for a study of the right-action, when the 
set of triads is identified with the images of one triad by T/I in the context of G.I.S. 

5 Essentially because a group acting simply transitively on major/minor triads while 
discerning between both kinds - meaning there is a normal subgroup of transposi¬ 
tions - must be P 12 , though there are 48 isomorphic versions. Moreover, there are two 
‘good’ ways to define the Tonnetz group, see [7] which pinpoints the extraordinary 
isomorphism between T/I (acting on pcs) and the PLR group (defined on the Ton¬ 
netz) both as subgroups of the 620,448,401,733,239,439,360,000 permutations of all 
24 triads! 
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It may be debatable whether it is the group structure that is actually used in 
analysis. What is beyond doubt is that we have lost the power to simultaneously 
invert pcs and triads, since inversions have become local, i.e. contextual: what 
acts as P between C major and C minor, i.e. I 7 , also acts as P on Ffl major 
or minor but as a different operation on any other triad (for instance, between 
D minor and Bb major, I 7 is the L operation). Should this be remedied? How? 
What of other pc-sets? 

2 Focusing on What Really Changes: Inversion in an 
Orbit of Homometric Sets 

One possible line of thought is to realize that inversions are special cases of 
interval-content preserving operations. Indeed there are no other trichords than 
maj/min triads featuring a major and minor third and a fifth, but tetrachords 
such as (0 1 4 6) and (0 13 7) are known to exhibit the same interval distribution 
though they have different shapes. This is the famous Z-relation, more properly 
called homometry [9]. Though for musicians it appeared as a byproduct of the 
study of the interval content iv 6 , for crystallographers who invented the notion it 
is better defined in terms of diffraction, i.e. Fourier Transform. Indeed diffraction 
is created by gaps, holes, intervals between objects (the atoms in a crystal), 
and the diffraction pattern results from a formula summing different sine waves 
(corresponding to the diverse paths that contribute to adding, or subtracting, 
light on a given point) which amounts to a Fourier Transform. 


2.1 Homometry 

Leaving crystals aside for the time being, a workmanlike discussion of homometry 
for musicians is the following: 

1. Two objects (say subsets of Z 12 ) are homometric if they share the same 
intervalic distribution. 

2. The intervalic distribution IFunc is a convolution product. If A C Z n for 
instance, 

IFuiica(&) = e A\ x — k e A} = 1 a{x) 1 a{x — k ) 

xez n 

= ^2 1 A(x)l-A(k - x) = {1 A * 1- A )(k) 

X(zZ n 

where 1 a is the characteristic map 7 of set A. 


6 Or, more generally, of the IFunc of two pc-sets. Here we focus on intervals within one 
pc-set, i.e. IFunc a is essentially the interval vector ivA up to definition conventions. 

7 If necessary, one can easily generalize to any distribution on Z n - for instance mul¬ 
tisets, wherein any pc can appear not only 0 or 1 time, but with any real value. 



Strange Symmetries 139 


3. The Fourier transform of a pc-set A is the Fourier transform of 1 a, namely 
the map defined by the following values (called Fourier coefficients) 

T A {t) = Qt) = ]T 1 A {k)e~ 2ikt */ n = E g — 2iktn/12 

/cGZ n kEA 

in the simple case that we are studying. 

4. Fourier transform turns convolution product * into termwise product 8 x, 
with the following consequence of note: 

IFunc^ = | 1 a I 2 - 

This value is essentially the light observed at a given point on a diffraction 
figure, vindicating the crystallographic interpretation. It is also the usual, 
modern definition of homometry: 

Two objects are homometric if and only if their Fourier coefficients 
have the same magnitude. 

5. In particular, two isometric objects are homometric, among them a pc-set 
and its inversions. The reciprocal is false, but the question of finding all 
discrete non trivial homometric orbits is still a formidable open problem, cf. 
[9]. We circumvent it by the subterfuge of introducing a continuous context: 
instead of characteristic functions of sets, taking values 0 or 1, we allow general 
distributions , i.e. maps from Z n to R or even C. 

2.2 Transformations Between Homometric Distributions 

In a way, it is obvious to find all transformations that permutate all pc-sets 
homometric to a given one: select those permutations of (the subsets of) Z 12 
which work, and only those! They form together a subgroup of all permutations, 
which can be found by appropriate software and described by relations. This is 
useful for compositional applications if one has an eye on a particular class of 
homometric objects, see [8]. 

On the other hand, this a posteriori approach does not allow to predict or 
understand the size and structure of the group involved; for instance, for A = 
(0 14 6) in Z 12 , a group with 48 elements is found, which acts simply transitively 
on the 48 ho mo metric tetrachords. It is better to look at IFunc^ and observe 
that A is an all-interval set, each interval occurring once and only once (except 
prime and tritone for obvious reasons); this entails that any affine transform of 
Z 12 , i.e. any map 


x^ax + b a G {1,5, 7,11} be Z i2 , 


8 Meaning / * g = f x < 7 , i.e. the Fourier coefficients of convolution product f * g are 
obtained by multiplying the corresponding coefficients for / and g. This wonderful 
feature (noticeably simplifying computation of any convolution-related operation) is 
essentially a characterization of discrete Fourier transform, cf. [1], Theorem 1.11. 
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since it per mutates the intervalic distribution in general, will preserve it in this 
case; and hence these 48 operations coincide with the affine group modulo 12. 
This case is also gratifying in that the group operates simply transitively, a nice 
case of a non abelian Generalized Interval System. 9 

However, there is no universal way (in this context of permutations) to find 
the ‘good’ group acting on the orbit of all pc-sets homometric to a given one, 
according to 

Theorem 1 (Mandereau 2011). For n = 8 or n > 10, there is no subgroup 
G of S n (acting both on points and on subsets of Z n ) such that for any A subset 
of Z n , the orbit G.A is equal to all pc-sets homometric with A. 



Fig. 4. Quantic transitions in the Bullvalene molecule 


Nonetheless, in many situations we want to understand what the transfor¬ 
mation exactly is, the ‘magical gesture’ that turns A into one of its homo metric 
counterparts. The present paper was actually triggered by a Chemistry article 
[4] where the 1,209,600 states of the Bullvalene molecule and their transitions by 
tunnel effect (see Fig. 4) are described with a matricial formalism, very similar to 
the one we will presently introduce. 10 This formalism was originally used in [3] 
for the purpose of algebraically combining multisets (or scales, chords, rhythms) 
and is thoroughly developed in [1] where proofs and details can be found. 

Definition 1. The matrix S associated with a distribution s : Z n —> C is the 
circulating matrix whose first column is (s(0), s(l)... s(n — 1)) . In the usual 
case of a pc-set A C Z n , with distribution 1 a, we denote the associated matrix 
as A. 


For a compact example, here is the matrix associated with C major triad (dis- 

/ 1001010 ' 

0100101 

tribution (101010 0)) in a 7-note universe: 


10 10 0 10 
0 10 10 0 1 
10 10 10 0 
0 10 10 10 
\ooioioi, 


The important feature is the obvious shift from one column to the next. 
Mathematically speaking, the strong point of this model is that matrix multi¬ 
plication has a familiar meaning: 


9 Notice that, in general, a chord and its inverse constitute the simplest GIS, with the 
dihedral group V\ made up of the inversion and identity. 

10 The different states are modeled as symmetrical real matrixes with same spectra, 
hence transitions between them are achieved by way of unitary matrixes, just like 
the spectral units defined below. 
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Proposition 1 . The matrix associated with the convolution product of two dis¬ 
tributions s *t is S x T. 

For instance it is easily checked that the matrix for IFunc^, the distrib¬ 
ution of intervals, is simply A T x A. Moreover, since all these circulating 
matrixes are polynomials in the matrix J associated with j = (0,1,0...0), 
they are all diagonalisable in the same linear basis (cjo • • .^ n _i), where (Jk = 
(1, e 2z/c7r / n ,... e 2z/cm7r / n ,... ) T is eigenvector for J with eigenvalue e - 2lkrn7r / n m 
The passing matrix f2 whose columns are the cj/ds is usually normalized by 
1 / y/n, making it unitary. In this basis it is classically checked that 

Proposition 2. Any circulating matrixS is diagonalisable in (cjo ... cj n _i). The 
eigenvalues of S are the Fourier coefficients of distribution s. 

It follows 

Theorem 2. A and B are homometric if and only if there exists a unitary 
circulating matrix U such that A = U x B. 

This means that U lies in the same sub-algebra of circulating matrixes, but that 
its eigenvalues have magnitude 1 (so that A and B have Fourier coefficients with 
the same magnitude). Equivalently, one can consider the associated distribution 
u and state 1 a = u * 1#. 

Definition 2. Such a matrix (or distribution) is called a spectral unit. The set 
SU of all spectral units is a multiplicative abelian group. 

By diagonalisation, this group is isomorphic with the group of diagonal matrixes 
whose eigenvalues he on the unit circle, hence it is topologically equivalent to 
the toms T n . 

Example 1 . From (0 1 4 6 ) to (0 1 3 7), the spectral unit is 

_ ( 1 1 1 1 111 1 IK 

^ 4 5 4 5 4 5 4 ’ 2’ 4 5 4 5 4 5 2 ' 

The “unit” thing is clearly visible on its eigenvalues, whose magnitude is always 
1 , namely 


i 1 ’ 


u = I, e 6 , e 


' i 1 p 6 — Ip 

, J., O , ->-5 ^ 


3 , e~ 


1 - 


This characterization of homometry (with Fourier transform) originates in [14] 
(1982/84). It embodies the idea of equality of diffraction patterns, the original 
problem set by cristallographers. 

This lengthy exposition reaches a satisfying conclusion: 

Proposition 3. For any distribution s, the homometric distributions are all the 
u * s [or equivalently their matrixes are the U x S] where u describes the group 
SU of all spectral units [U being its associated matrix]. 
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We have constructed a satisfying group whose orbits correspond exactly to 
homometry. Moreover, most of the time 11 the group of Spectral Units acts simply 
transitively on an orbit, i.e. the latter and the group form a G.I.S. 

Is this then a violation of Theorem 1? Not at all: the discrete homo metric 
subsets (whose distribution is a characteristic function, with values 0 or 1) are 
undoubtedly somewhere in the continuous orbit, but finding them all is no easier 
than before. 12 

One possible reduction is to the finite subgroup of rational spectral units 
with finite order, one can imagine a spectral unit as a set of several clocks, each 
rotating one eigenvalue by some angle. The unit has finite order if all clocks 
have a common period. The applet in Fig. 5 features all available values of such 
spectral units for n = 12 applied to any pc-set. The result is not always a genuine 
set, i.e. it may display truth values different from 0 or 1. The classification and 
computation of all such spectral units was achieved by the tricky Theorem 2.11 
in [1]. 

However we are getting close to the stumbling block of this nice modelization: 
in this large, continuous group, some spectral units have infinite order, and 
notably this is the case for neo-Riemannian inversions! 


2.3 Transpositions and Inversions as Spectral Units 

The case of musical transpositions is straightforward: in mathematical terms, 
their group T is mapped to the subgroup generated by the spectral unit 
j = (0,1,0,0.. .0). In another words, applying (to a pc or a pc-set or any 

distribution) is equivalent to multiplying the appropriate circulating matrix by 
jk 13 

Now for inversions. The general theory provides a spectral unit (and usually 
one only) transforming a pc-set into one of its homometric pc-sets. 

Example 2. From C major (0 4 V to C minor (0 3 7) 14 the spectral unit of 
the inversion is 


P = T(7,4, -2,1,7,4, -2,1, -8,4, -2,1), 

associated with the Parallel operation P. 

We begin with unexpectedly good news: 

11 Exceptions are pc-sets, or distributions, where one or more Fourier coefficients are nil, 
i.e. the matrix is singular. These sets are the famous ‘Lewins’s special cases’ whose 
definition in his seminal paper [10] was so irredeemably obscure. [1], Sect. 2.2.2, 
shows a way round these singularities when the rank of the matrix is n — 2. 

12 Though maybe the strategy of exploring the continuous orbit for discrete solutions 
warrants further exploration. 

13 Remember that the whole algebra of circulating matrixes is made of polynomials 
in J . It is deeply satisfying in a sense that in this model every single object or 
transformation originates in the single transposition by one semitone. 

14 Evoking the first bars of R. Strauss’s Also Sprach Zarathoustra. 
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Fig. 5. An applet: transforms of 0135 by spectral units 


Proposition 4. Neo-Riemanniann transformations P, L, R in the group of 
spectral units are no longer local but global, in the sense that for any major 
triad X, X minor is obtained by applying the same spectral unit p as above 
(idem for L, R). Moreover, there is a simple, fixed relationship between all three 
operations: p = j 8 *£ = j 3 *r (meaning that £, r are just circular permutations 
of the distribution p). 

Proof 1. This is a corollary of the commutativity of spectral units. Say X major 
is C major transposed by k : X = Tk(C), i.e. x = j k * c since transposition T & 
corresponds with spectral unit j k = (0, 0 ..., 1 , 0 ... ); but since p (and similarly 
£,r or indeed any spectral unit) commutes with j, we get 

p * j k * x = j k * (p * x), 

meaning that X major is transformed into “C minor transposed by k”, i.e. X 
minor. □ 

Now the bad news: p,£,r are no longer involutions! In other words, the oper¬ 
ation from C major to C minor is not the same as the reverse operation: 
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p 2 7 ^ (1, 0, 0 ...) [identity]. Worse still, p n never gets back to identity: all neo- 
Riemannian operations now have infinite order. 15 On Fig. 6, one can picture on 
a torus (a projection of SU) the iteration of R on C major: the only other true 
triad in the infinite orbit is the next one, A minor (however any of the 24 triads 
is approached infinitely many times by this orbit). 



Fig. 6. Infinite orbit under R of a triad 


This can be seen, and deplored, when one scrutinizes the eigenvalues of p 
below: one, and only one 16 , has infinite order (in the unit circle as a multiplicative 
group); it is signaled here in boldface. 


p = (i, 


V3 i 
~2 + 2 


1 ,V3 

2 + '~ 2 ' 


4i 

5~’ 


i .Vs Vs 


- + l- 



Of course one could have wished for a nicer embedding of T/I (or its dual, 
the PLR group) in the SU group, but this was obviously doomed from the start 
since the latter is abelian and the former is not. Nonetheless, we do think that 
this way of interpreting P, L and P, and perhaps even the infinite subgroup 17 
that they generate in the Spectral Units, should be kept in mind and has many 
advantages: 

1. First and foremost, these operations are now global. 

2. Though one needs distinguish between (say) p (from major to minor) and its 
inverse p _1 (from minor to major), this may well be a blessing in disguise: 

15 This stands also for compound operations, like the Slide S (exchanging F minor and 
E major) insofar as they exchange minor and major triads. 

16 Actually the values are repeated backwards and conjugated so that only the first 6 
are featured. 

17 Its topological closure is a subgroup of SU (a finite union of torii with smaller 
dimension), whose orbit when acting on one triad contains all of them. 
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not all musicologists did appreciate as much as H. Ottinger or B. Riemann 
the notion that switching from minor to major was the same operation as its 
reverse. 

3. Also there are numerous cases when inversions between pc-sets do have a 
spectral unit with finite order: for instance, from C D Ffl to Bb C Ffl the 
spectral unit has order 6 - it is a square root of a major third transposition. 
Up to complementation, inversion and transposition this is the complete list 18 : 

( 0 , 2 }, ( 0 , 3 }, { 0 , 4 }, { 0 , 6 }, { 0 , 2 , 4 }, ( 0 , 2 , 6 }, ( 0 , 3 , 6 }, { 0 , 4 , 6 }, { 0 , 4 , 8 }, 

{ 0 , 1 , 2 , 3 }, { 0 , 1 , 2 , 7 }, ( 0 , 1 , 3 , 7 }, { 0 , 1 , 4 , 5 }, { 0 , 1 , 5 , 6 }, ( 0 , 1 , 5 , 8 }, ( 0 , 1 , 6 , 7 }, 

( 0 , 2 , 4 , 6 }, { 0 , 2 , 4 , 8 }, ( 0 , 2 , 6 , 8 }, ( 0 , 3 , 6 , 9 }, { 0 , 1 , 2 , 4 , 7 }, ( 0 , 1 , 2 , 6 , 8 }, ( 0 , 1 , 3 , 5 , 6 }, 
( 0 , 1 , 4 , 7 , 8 }, ( 0 , 2 , 3 , 4 , 6 }, { 0 , 2 , 4 , 6 , 8 }, ( 0 , 2 , 4 , 6 , 9 }, { 0 , 2 , 5 , 7 , 8 }, { 0 , 3 , 5 , 6 , 7 }, 

{ 0 , 1 , 2 , 3 , 4 , 5 }, { 0 , 1 , 2 , 3 , 7 , 8 }, { 0 , 1 , 2 , 4 , 5 , 6 }, { 0 , 1 , 2 , 4 , 5 , 8 }, { 0 , 1 , 2 , 6 , 7 , 8 }, ( 0 , 1 , 3 , 4 , 7 , 9 }, 
{ 0 , 1 , 3 , 5 , 6 , 9 }, { 0 , 1 , 3 , 5 , 8 , 9 }, { 0 , 1 , 3 , 6 , 7 , 9 }, { 0 , 1 , 4 , 5 , 6 , 8 }, { 0 , 1 , 4 , 5 , 8 , 9 }, { 0 , 2 , 3 , 4 , 6 , 9 }, 
{ 0 , 2 , 3 , 5 , 6 , 8 }, ( 0 , 2 , 3 , 6 , 8 , 9 }, ( 0 , 2 , 4 , 5 , 7 , 9 }, ( 0 , 2 , 4 , 6 , 8 , 10 }. 

Alternatively, we might desire to retain the non-commutative structure of opera¬ 
tions and the involutivity of inversions. Nonetheless, we can improve on the T/I 
model, inasmuch as we can embed all (or almost all) pc-sets in the same space. 

3 Remembering the Other Chords: Inversion 
in the Torus of Phases 

3.1 The Space of Fourier Coefficients’ Phases 

Between ho mo metric objects, we have seen that the magnitude of Fourier coef¬ 
ficients does not change. The significant dimension is the phase of these coeffi¬ 
cients, i.e. the angle they make (as vectors in the complex plane) with a refer¬ 
ence direction. Varying magnitudes between diverse pc-sets has been well studied 
since 2005 [13], but the study of the space of phases is much more recent [2,16- 
18]. Indeed, focusing on this angular dimension, which is more about harmony 
than shape, can also be done for non-homometric pc-sets, even with differing 
cardinalities. It is a very strong asset (which mysteriously escaped the author of 
the seminal paper [2]) that this space of phases includes the cyclic model, along 
with the circle of fifths, the Tonnetz and most previous models. 19 Look up on 
Fig. 7 the disposition of triads and of fifths, and the chromatic sequence where 
for instance the chromatic line (dotted, blue) appears broken but in fact is solid, 
coming back time and again from above after disappearing below. 

So far, the most useful space of phases is defined as follows: 


18 Unsurprisingly, those pc-sets related to their inverse by an involutive spectral unit 
are those with a symmetry axis, like major sevenths. 

19 Quoting [18]: “... there is a different way of topologically enriching the Tonnetz that 
preserves the musical insights [... ] and leads to a concept of harmonic distance. Such 
mixing of different-cardinality sets is not possible in voice-leading spaces without 
forfeiting their basic geometric properties”. 
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Definition 3. Let us write down the polar coordinates of the Fourier coefficients 
of a pc-set (or distribution): 

a k = \a k \e iVk . 

Then the pair of angles (<p 3 ,tfi 5 ) lives on the Torus of Phases 


48 / / 6/ 

169 1/ * 



1/ / ; ie 

14$' 8e'* * 36s..--""’ 

16 / 34 


Fig. 7 . Coordinates are P 3 ,P> 5 - Right, same toms in 3D (Color figure online) 


It is a torus because both angles are defined modulo 27r: a representation of that 
space on a plane must be understood as glueing the left and right (and also 
bottom and upper) sides together, as in Fig. 7 left. A 3D immersion of such a 
torus is given on Fig. 7 right, with the solid chromatic line winding around it. 

The pertinence of this model for musical analyses accrues every day. Still it 
has two drawbacks: 

1. Some pc-sets (tritones, diminished sevenths) do not have coordinates in 7^5 
and cannot be represented as points here. 21 

2. Some distinct pc-sets share the same couple of coordinates. 

This last may be seen, not as a bug, but as a feature: it is reasonable to identify 
(at least in a bidimensional model) the diatonic C major scale and its penta¬ 
tonic (CDEGA). This confusion, or ambiguity, is perceptively prevalent in many 

20 J. Yust prefers p — 2 tt<P/12 where defined modulo 12, is often an integer and is 
easier to compare with simple values such as those of single pcs. 

21 This is because a nil Fourier coefficient does not have a phase. One possibility is to 
consider that - for instance - an augmented triad has all values of p 3 at the same 
time and can thus be represented as a vertical line. This enables modulations passing 
through such a chord, entering any point of the line and getting out at any other 
point, recalling the flexibility of these chords in Douthett’s chickenwire model. See 
[2] for an example in Schumann’s Kinderszenen. 
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dialects of Rock music for instance (guitar solos are often pentatonic, in a gen¬ 
erally diatonic context). 

Actually, those two scales share their coordinates with the single pc D, which 
can be explained by its role as a center of symmetry. In the torus of phases, 
not only do we have our cake (the diverse pc-sets positioned according to their 
harmonic relationships), but we can eat it, too (moving pc-sets and pcs by 
exactly the same, simple, geometric operations inherited from T/I). 

3.2 The Dihedral Group Acting in 7^ 5 

This was developed mainly by [18], see a synthesis from a mathematical angle 
in [ 1 ], Chap. 6 . 

Readers familiar with the Tonnetz will readily agree that transpositions in 
the torus of phases are simply translations (remembering that opposite sides are 
glued together so that going out to the right means going in from the left). Actually 
the directions of transpositions by fifths or semitones are marked on Fig. 7. J. 
Yust first noticed, and proved, that inversions of any poset (including single 
pcs) are central symmetries whose centers are the fixed points of the inversion. 
More precisely, 

Proposition 5. The action of Ik induces a central symmetry on the torus of 
phases: if pcs or pc-sets A and B are symmetrical around a center c (resp. a 
dyad (a b)), then their torus projections are symmetrical around the torus image 
of c (resp. the image of the dyad). 

See for instance on Fig. 7 how the triads (0 4 7) and (0 4 9) are exchanged 
by symmetry around the dyad (0 4), as are 0 and 4 - or 7 and 9. The actual 
operation on phase coordinates is 

(<p 3 \ /a - </> 3 \ 

VPsJ \P~<P5 J 

where a,/? are the sums of the angular coordinates (/? 3 , <£5 of single pcs 0 and 4 
(they depend on the choice of origin). 

This model includes in particular the Tonnetz and its dual, and all their 
shared symmetries. It features a satisfying compromise: T/I acts, and it seen to 
act, in the same simple way on all pc-sets, regardless of their cardinalities . 22 For 
instance, the R transformation for C major, which the symmetry around D = 
2 , can be applied to the whole C major scale , and it is quite satisfying to find it 
invariant under this transformation (A minor natural). 

However, some remarks are in order: 

1. Because of the circular nature of the torus, caution must be exercised: lynx- 
eyed readers will have noticed that another center of symmetry should work 

22 There is an isomorphism between the induced left-action of T/I on subsets of Z n 
and (a subgroup of) the dihedral group of translations/central symmetries on the 
toms. 
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in the last example, namely the single pc 8. Indeed it does, and it is the 
same symmetry: but drawing a line (say) through 8 from 047 to 049 involves 
crossing the glueing line at the top and bottom of the figure. Yust gives a 
nice interpretation of this choice of path in Schubert, see [18]. 

2. The miraculous Proposition 5 stands because the phase of the sum of two 
complex numbers with the same magnitude is the mean value of their phases: 
this involves already two different centers for the symmetry (dephased by i r), 
which is still fine because double the phases are involved in the computation 
above, but falls apart when considering three-sets or more: triad 047 is not 
the center of the triangle whose vertices are 0, 4 and 7. 

The embedding of T/I into the isometries of the torus of phases cannot be unduly 
generalized, this should not deter from using it properly. 

4 Many Other Models 

I have left aside a number of other possible geometric models. Some readers may 
have expected a discussion of orbifolds - /c-uplets of pcs quotiented by some 
symmetry group - wherein the ordinary symmetries, such as inversions, retain 
their original meaning. For instance, the Moebius strip of unordered dyads (see 
[15]) looks very much like the pictures above of the torus of phases. Indeed, it 
can be embedded in it, though it lacks singles pcs, triplets and all other pc-sets. 
However, orbifolds of higher order are limited to pc-sets with fixed cardinalities; 
they are hard to visualize, and feature severe singularities which get in the way 
of picturing the inversions; finally, they do not readily allow a visualization of 
the Tonnetz of triads (in many quotients of the space of 3-pc-sets, all minor 
and major triads appear as one single point): in most cases, the inversions are 
quotiented out, which disqualifies this model from the present discussion. 

A lesser known model with a nicer geometry is the 4D-Model, for all pcs and 
(almost all) pc-sets, described by Baroin in [5]. Its natural group of symmetries 
(isometries of the 4D-ambient space) is actually isomorphic to the affine group 
on Z 12 , as discussed in [6]. This brings to mind the discussion of IFunc and 
homometry above. However, despite close connexions with Fourier coefficients 
as and a 4 for single pcs, which are essentially the complex coordinates of their 
representations in this 4D-space viewed as C 2 , the nice symmetry features dis¬ 
cussed in Proposition 5 on the torus of phases vanish when pc-sets are considered 
in Baroin Planet 4D-model: there is simply no miracle Proposition 5 this time, 
and inversion around a pc or a pair of pcs just does not give the expected result 
for most pc-sets. 

5 Conclusion 

Remarkably, it appears to be impossible to retain all of the many interesting 
features of inversions in only one model. The simple cyclic model is cumbersome 
when pc-sets are introduced (as polygons), showing no structure for the Tonnetz 
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for instance. But the latter obfuscates most pc-sets, and in it P, L and R are local 
operations. It was rewarding to obtain a global status for these operations in the 
context of spectral units structuring the space of subsets of Z 12 - algebraically 
extended to the vector space R 12 or C 12 , layered by SU in disjunct torii - but 
on the whole, it seems that the local nature of (say) R is too strongly rooted 
in the transformation to be conveniently discarded. Finally, the most satisfying 
model by far appears to be the torus of phases, where one can see and organize 
pcs and pc-sets together, and play with inversions (and transpositions) in a very 
visual and obvious sense, enjoying both the Tonnetz, the T/I dihedral group 
acting on all pc-sets, and much more, on the same 2D-picture. 

Anyone interested in the natural extension of T/I to the affine group (or, say, 
the Schonberg group for transformations of cyclic tone-rows) would do well to 
scrutinize Baroin’s 4D model, but such endeavors are beyond the scope of the 
present paper. 
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Abstract. Several ways to appreciate the diatonicity of a pc-set can 
be proposed: Anatol Vieru enumerates connected fifths (or semitones, as 
an indicator of chromaticity), Aline Honing similarly measures ‘interval 
categories’ against prototype pc-sets [8]; numerous generalizations of the 
diatonic scales have been advanced, for instance John Clough and Jack 
Douthett ‘hyperdiatonic’ [5] which supersedes Ethan Agmon’s model [1] 
and the tetrachordal structure of the usual diatonic, and many others. 
The present paper purports to show that magnitudes of Fourier coef¬ 
ficients, or ‘saliency’ as introduced by Ian Quinn in [9], provide better 
measurements of diatonicity, chromaticity, octatonicity... The latter case 
may help solve the controversies about the octatonic character of Slavic 
music in the beginning of the XX th century, and generally disambiguate 
appreciation of hitherto mostly subjective musical characteristics. 


Keywords: Diatonic • Chromatic • Octatonic • Saliency 
Fourier transform • Stravinsky 


1 Introduction 

Tautologically, the most diatonic seven-note scale is the diatonic scale, i.e. any 
collection/pc-set translated from {0,2,4,5,7,9,11} in Z 12 . Slightly less obvi¬ 
ously, the most diatonic collection in five notes is certainly the pentatonic scale 
{0, 2,4, 7, 9}. But how is one to compare, say, {0, 2, 3, 5, 7, 8,11}, {0, 2,4, 5, 7,9} 
or {0,2,4,6,7,11}? The question asked here is “how can one measure (with 
some precise, computable definition) the diatonic character of a pc-set?” While 
we are at it, it costs nothing to ask this question while replacing ‘diatonic’ with 
‘chromatic’ or ‘octatonic’ (other adjectives will appear subsequently). Indeed it 
is a vexed issue (see [11]) whether Stravinsky’s music is octatonic; alternatively, 
it would be nice to appreciate objectively the evolution of chromaticity through¬ 
out Wagner’s Tetralogy (with Tristan in between) and what remains of it in 
Parsifal - similar questions abound. 

Of course several answers have been advanced. We will present some of them 
through a few examples, and move on to argue why the most recent one, Ian 
Quinn’s “saliency”, is the best so far. 

Some knowledge of pitch classes and pitch-class sets theory is assumed, 
alongside with basic music theory - common scales and chords, alongside with 
familiarity with Western Music. More elaborate machinery will be developed in 
Sect. 1.2 and later. 
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1.1 Some Examples 

Let us focus on four pc-sets occurring at the beginning of Stravinsky’s Rite of 
Spring. The first two descending motives articulate C B G E B A i.e. the pc-set 
X = {0,4, 7, 9,11}. Then D and CjJ are added, making up Y = {0,1, 2,4, 7, 9,11}; 
it turns into something messier with chromatic quarts in the bass, that cover 
the chromatic aggregate. I will complete the sample with the black-keyed motif 
in measures 9-12, playing Cjj Fjj DfJ with a GJJ thrown in at the end, i.e. 
Z = {1,3,6,8}, and the new descending motif in measures 15-17 playing 

T = { 0 , 1 , 3 , 6 , 7 , 8 , 9 }. 

Undoubtedly X can be considered diatonic. After all, it is a subset of a major 
scale - better, two major scales. There is, or was, a large current in XX th century 
Music Theory that focuses on inclusion relationships - so-called set-complex 
theory in American Set Theory, but also the lesser known notion of ‘poor’ and 
‘rich’ modes by Anatol Vieru [12] 1 , an independent and fairly well contrived 
alternative to the previous theory. However, numerous ambiguities arise: 

1. How much, exactly , is X diatonic? Can we grade it? 

2. In particular, is it more or less diatonic than other 5-note pc-sets, like 
{0, 2,4, 7,9} or {0, 2,4, 5, 7} which are also subsets of diatonic scales? 

3. What of sets which are not exactly included in a diatonic mode (like F, Z) 
but almost? 

Possible answers, clinging to the set relationships of inclusion and intersection, 
take into account the (maximum) number of common notes between a pc-set and 
each and every diatonic collection; or the percentage of such common notes aver¬ 
aged over some common basis (the cardinality of the mode, or 7, for instance). 
In the chosen examples, F shares six notes {0, 2,4, 7, 9,11} with C and G major, 
and six others {1,2,4, 7, 9,11} with D major. On the other hand, F is included 
in no less than four diatonic scales, (albeit far from the ones that ‘neighbored’ 
I or 7), so Z should be rated diatonic - but how much so, when we have so 
many diatonic contexts to choose from? 2 Meanwhile, T intersects three diatonic 
collections in five notes, five others in four notes and the remaining ones in no 
less than three notes. How diatonic is that? Is it actually more chromatic? Or 
octatonic? 

I will not waste time advocating against the set-theoretical approach, which 
fails because set-theory is too poor to take into account complex musical 
notions 3 , but rather let the more elaborate models speak for themselves. 


1 In short, in his theory a poor mode is a subset of several rich modes. 

2 Going to extreme cases: is a single note diatonic? What about a minor third? 

3 Among other things, it does not integrate the group structure of intervals modulo 
octave, not to mention subtler features. As G. Mazzolla wryly observes in the preface 
of [10], it is hopeless to try and apprehend the huge complexity of music with only 
the simplest mathematical tools - though this complexity can be reconstructed from 
all its simplifications, if one construes ‘simplification’ as ‘forgetful functor’. 
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The notion of interval vector (iv) is more precise, and provides several illu¬ 
minating informations on a pc-set. 4 Simply put (following one of the latest of D. 
Lewin’s illuminating comments), it is the probability 5 of hearing a given interval 
if two pcs are chosen at random in a given pc-set. Then 

iv*(fc) = #{(a, 6) e V 2 | b - a = k} = #(X n (X + k)) 

i.e. the number of occurrences of interval k between elements of V. 6 

Since a diatonic collection has maximal value for iv(5) = iv(7) = 6 (among 
7-note scales), it is natural and (important in practice) fairly elementary 7 to 
compute ivx(5) for any pc-set X and compare it against that value. 


D = {0,2,4,5,7,9,11} 


W\M 


0 1 234567891(11 




T = {0,13,6,7,8,9} 


Y = {0,1,2,4,7,9,11} 


VsAJv\ 


012345678 91(11 


012345678 91Q1 
Fig. 1. iv for the diatonic D, X, Y, Z and T 


Already iv provides some satisfying information (see Fig. 1): 

- For X, iv(5) = 3 is indeed the maximal coefficient; but it is far below the 
value for the diatonic scale, which might express the contextual ambiguity 
(too many different diatonic scales include X). On the other hand, iv(l) = 1, 
the chromatic value, is quite small with only one semitone. 

- For Y, iv(5) = 5 is almost as large as in the case of a diatonic collection. 
Notice however that iv(2) is just as large (many whole tones) and iv(l) is 
greater than it would be for a diatonic collection. 

- For Z, iv(5) = 3 is the largest coefficient and also the maximal possible value 
for a 4-note scale, confirming the diatonic character despite the contextual 
indetermination of its many diatonic neighbors. 

4 The machinery involved, as we will develop below, is actually an algebra structure 
(with a convolution product) on the vector space of distributions, i.e. vectors describ¬ 
ing how much of C, C(t, D and so on, are featured in a much generalized pc-set. 

5 Up to a constant. 

6 For technical reasons that will be made clear below, we do not take into account the 
symmetries, e.g. iv(n — k) = i v(k) and consider ivx as a vector in R n . 

7 Just check the number of common tones between X and X + 5, using the second 
formula in the definition above. 
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- Lastly, T is much more contrasted, with iv(6) a clear maximum 8 and other 
coefficients between 3 and 4. 

This looks fairly close to musical perception, at least as far as diatonicity and 
chromaticity are concerned. However, let us take a closer look at two hexachords 
which share the same value for iv(5) (see Fig. 2): 77 = {0,2,4,5,7,11} and 
77' = {0,1, 5, 6, 7, 8}. The first one, 77, is a subset of C major, the second 77' has 
only five pcs in common with Cft and G# major and appears substantially more 
chromatic and less diatonic. 9 


{0,2,4,5,7,11} 



{0,1,5,6,7,8} 



Fig. 2. iv for two hexachords 


This provides evidence that, at least in some cases , the iv is not good enough 
to discriminate between different degrees of diatonicity. This requires both elu¬ 
cidation and improvement. 

Anatol Vieru went deeper still in his analysis of diatonicity (or chromatic¬ 
ity), and understood the importance of connectivity of fifths. In a diatonic (or 
pentatonic) collection, we face an uninterrupted sequence of fifths, e.g. F C G D 
A E B. In 77,77', there are two broken fifth sequences, respectively (5, 0, 7, 2), 
(4, 11) and (5, 0, 7), (6, 1, 8): the first collection 77 adheres more closely to the 
generating structure of the diatonic scale than 77'. Hence Vieru’s definition of 
diatonicity and chromaticity: 10 

Definition 1. The diatonicity (resp. chromaticity) of a pc-set is the maximal 
number of consecutive fifths (resp. semitones) between elements of the pc-set. 

In the above example, 77 gets 3 and 77' only 2, though the values of iv(5) are 
the same (4). Will the reader agree that the first is roughly 50% more diatonic 
than the second? Notice that this value is less obvious to compute than the 
iv, unless one skillfully multiplies * 11 the pc-set by 5 and reads the sorted result 

8 Actually overrated since every tritone is tallied twice. 

9 Many other examples can be devised if this one does not sound convincing to you. 
A more blatant one would be {0, 2 , 7, 9} vs. {0,1, 7, 8 }, both with iv(5) = 2 . 

10 u J’ai elabore un procede pour mesurer le degre de diatonisme et de chromatisme 
d’un mode, base sur la comparaison de la suite des quintes parfaites connexes avec 
la suite des demi-tons connexes a Vinterieur du meme mode.” [12]; Definition 1 is 
more or less a translation of this. 

11 Vieru had discerned that the two notions are interchanged by multiplication by 5 
(or 7) modulo 12, the classical M 5 (or Mj) operator; and offered thoughtful insights 
on this dichotomy as expressed by the affine group on Z 12 . 
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for chromaticity, which is a way of reading visually the value on the chain of 
fifths (cf. right half of Fig. 3): the first pc-set turns into {10,11, 0,1, 7, 8} and the 
second into {11, 0,1,4, 5, 6}. 



Fig. 3. Vieru’s chromaticity is lesser in H than H' (left) but diatonicity stronger for 
H , as read on 5 H and 5 H' (right) 


Let us cut this even finer. We would like to express that H = {0, 2,4, 5, 7,11} 
is more diatonic than H" = {0, 2,4, 5, 7, 8} (and T = {0,1, 5,6} less than T' = 
{0,3, 5,8}) though the “Vieru indexes” are identical. 

One possible, dual argument, would be that the covering chain of fifths is 
shorter in one case than the other: 5 0 7 2 (9) 4 11 vs 5 0 7 2 (9) 4 (11 6) 1 
8 (Fig. 4). This compounds neatly the inclusion criterion, the first scale being a 
subset of a diatonic and not the second, but at the price of mixing two criterions 
and enhancing the computational complexity: should we then look up, first the 
lengths of connected by fifth-components, and then, in case of ex-aequo, the span 
of the including chain of fifths? This is getting excessively complicated. 



Fig. 4. Covering chain of fifths for {0,2,4,5,7,11}, {0, 2,4, 5, 7, 8}, {0,1,5,6} and 
{0,3, 5,8} 


In [7,8], Aline Honingh endeavors to compare any pc-set with the appropriate 
‘prototype’: for instance a hexachord will be measured against the Guidonian 
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hexachord, a pentachord against the pentatonic, etc. For neatness, the pc-sets 
are first reduced to so-called ‘basic-form’. 12 For instance, the two tetrachords 
in the last example would be compared with the prototype C D F G (numeric 
results depend on the choice of similarity measure), which may or may not favor 
0 15 6 over 0 3 5 8. I will leave the reader to peruse further details in her 
papers, not because this measure lacks interest, but quite contrariwise (indeed 
it allows for instance to discriminate between Beethoven’s compositions early, 
middle, and late periods): it gets extremely close to the last, simplest, and overall 
best candidate. 

I present here without any technicity the values of saliency as defined in [9] 
and used in numerous analyses henceforth. Saliency is defined as the magnitude 
of one easily computed complex number, here (in the case of diatonicity) the fifth 
Fourier coefficient of a pc-set (formulas, references and properties will follow in 
the next section). For now, let us appreciate the values of this evaluation of 
diatonicity for all the above examples and some more. On Fig. 5, we can picture 
the magnitudes of all Fourier coefficients of the aforementioned heptachords, with 
the diatonic scale first. We focus on the fifth magnitude (equal to the seventh), 
highlighted by a dotted horizontal line, and notice that the ranking is: diatonic, 
Z, Y, X and T with little difference between Y and X, and a larger discrepancy 
with T. 



Fig. 5. Saliency for the diatonic, Z , Y , X , and T 


A similarly satisfying result also arises with the hexachords on Fig. 6, 
with an unambiguous ordering of diatonicities: {0,2,4,5,7,11} followed by 
{0,1, 5, 6, 7,10}, and last {0,1, 5, 6, 7,8}. 





Fig. 6. Saliency for the hexachords H,H',H" (horizontal line) 


Others examples support unequivocaly this experimental evidence: that the 
fifth saliency corresponds very closely with the intuitive perception of diatonicity. 

12 In some cases this may not the best for coincidence measurements: the more compact 
form of a pc-set adresses its chromaticity, not its diatonicity - consider the preceding 
discussion where the pc-set is first transformed by M 5 . 
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We must look into the mathematics to understand why this should be, and above 
all how this falls in with the competing measurements of diatonicity listed above. 


1.2 Some Technical Definitions 

I provide only a cursory outline; the reader of the present paper will only need to 
bear in mind that some easily computed 13 quantities, called Fourier coefficients, 
feature interesting characterizations of those pc-sets which divide the octave as 
evenly as possible. 14 For a very pedagogical introduction to Discrete Fourier 
Transform (DFT) of pc-sets, see [4]. For thorough discussion and details, see the 
recent reference [3] which purports to give the state of the art. 

To each pc-set A considered as a subset of Zi 2 , is associated firstly its char¬ 
acteristic function 

{ 1 if x c 4 _ 

, and second the Discrete Fourier Transform Fa = 1a of 
0 if x ^ A 

this function, the DFT of the set: 


T a e~ 2i7rxt/12 . 

xEA 


This function is a sum of complex numbers of the form e l ° which can all be 
construed as vectors (cos #, sin 6 ) of length 1 , whose direction is given by the 
phase 6 . The value is called the k th Fourier coefficient. We will mainly 

be concerned with its magnitude, i.e. the length of the sum of these vectors . 15 
Here is a list of elementary though useful results without proofs: 

- The set A can be reconstructed from the knowledge of the Fourier coefficients 

F A (k). _ 

- Fa{ 12 — k) = J r A{k) (conjugate complex number). 

- T a( t) = ~F^{t) for t 7 ^ 0 ( A is the complement of A). 

- F A ( 0) = #A 

- 12 \^A{k)\ 2 = 12 x #A 

- The Fourier transform of the (12-dimensional) interval vector iv^ is the 
square of the magnitude of Fa- 

Vfc e Z 12 w2(k) = \T A {k)\ 2 . (#) 

Slightly more technical is the Huddling Lemma in [ 2 ]: in laymen’s terms it 
states that, the closer the angles the larger the sum k e z0k (the vectors pull 
roughly in the same direction, coordinating their efforts). We will only need a 
simple case: 


13 One can compute them online at http://canonsrythmiques.free.fr/MaRecherche/ 
styled/. 

14 Originally discovered by Quinn [9] and formally proved in excruciating detail in [2]. 

15 The length of a complex number x + iy is ||(x, y)\\ m \x + iy \ = x 2 + y 2 . 
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Proposition 1. When the cardinality of A is fixed, |.7 7 a(1)| reaches maximal 
value when the elements of A are consecutive [i.e. when A is a chromatic chunk]. 

For us the most important result is 

Corollary 1. When the cardinality of A is fixed, |*Fa(5)| reaches maximal value 
when the elements of A are consecutive in the chain of fifths. 

Proof This follows from the relation Pa{ 5) = ^-5a( 1), which results from 5x5 = 
1 mod 12: hence the elements of 5 A must be consecutive, which is equivalent 
to the condition stated. 

This is but a special case of Quinn’s result: 

Among all pc-sets with same cardinality d, the maximum magni¬ 
tude for pA^d) is obtained when A is a Maximally Even Set (ME 
set). 

ME sets admit many equivalent definitions [2,5]. We will need only to remember 
the most important ME sets in Z 12 : 

1. The octatonic scale for d = 8. 

2. The diatonic scale for d = 7. 

3. The whole-tone scale for d = 6 . 

4. The pentatonic scale for d = 5. 

Quinn aimed at a landscape of chords (starting from experimental knowledge) 
and sketched first the highest peaks. From some kind of continuity principle, it 
was natural to infer that the height of a chord close to a summit would still 
be high. Hence the definition of saliency , as a quality of proximity to a ME-set 
(that Quinn called ‘prototype’): 

Definition 2. The d-saliency of a chord A is \pA(d)\. 

1. Among d-chords, saliency is maximal for d-ME sets. 

2. Remember if convenient that \pA{d)\ = 1^(12 — d)| = \Pj(t) |, hence both 
diatonic and (non hemitonic) pentatonic scales have maximum saliency for 
index 5 (namely 2 + y/3 ~ 3.73). 

3. For any (reasonable) distance on the set of pc-sets, a pc-set close to a ME set 
has saliency close to maximal. 

4. Any pc-set (with given cardinality) distributes its saliencies according to its 
geometry: the sum of the squares of all saliencies is a constant. This echoes 
the idea in [8] that the distribution of [IC] categories throughout a piece tells 
of its local character. 

All this provides fairly good mathematical justification, corroborated by 
empirical knowledge, for defining 

Definition 3. - The chromaticity of a pc-set A is \Pa(1 )I (remembering 
Proposition 1). 
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- The diatonicity of a pc-set A is |^a (5) |. 

- The octatonicity of a pc-set A is |.Fa(4)|. 

Some other values have actually been used for musical analysis: J. Yust calls 
‘quartal quality’ 16 the magnitude |JFa( 2)| which is, for instance, maximal among 
octachords for Tristan’s motif pc-set {2,3,4,5,8,9,10,11}; while the ‘major- 
thirdishness’ |.Fa( 3)|, for want of a better term (‘augmentedness’?) is maximal 
for an augmented triad, or for Schonberg’s Napoleon hexachord {0,1,4, 5, 8, 9}. 

Remembering the equation = 12 #A, it could be argued that 

the proper measure should be the squared magnitude - perhaps averaged by 
the cardinality - since the sum of all these values is a constant. Also, it is the 
squared value that appears in the DFT of the intervallic function. I will keep to 
the original definition for the present paper, but would not be surprised if the 
squared value were to supersede it in the future (following [IT]). 

2 DFT vs. iv 

2.1 Theoretical Advantage 

DFT is a change of (orthogonal) basis among many (polynomials, wavelets...). 
The major advantage 17 of expressing a (musical: pc-set, rhythm...) phenomenon 
in a basis of exponential functions is in the following: 

Proposition 2. The DFT exchanges convolution product * and termwise prod¬ 
uct x. Namely, if f,g are two maps from Z 12 to C and f ,g their DFTs, then 

f*g(k ) = f(k ) x g(k). 

This is crucial because iv is a convolution product: 

i v A (k) = ^2 1 A(t)l A (t - k) = y; l A (t)l- A (k -t) = (1a * 1-a)(*0 

and more generally, any coincidence measure or correlation (say, the number 
of elements of A that lie in any diatonic scale i.e. any transposition D + k of 
D = {0, 2,4, 5, 7, 9,11}) can also be read on a convolution product: 18 

E lA(t)lD+k(t) = E 1a(£)1d(£ — k) = (1 a * !-£>)(&)• 

Now the convolution product is a... convoluted operation 19 while termwise prod¬ 
uct is straightforward. Cognitively speaking, this means that complicated oper¬ 
ations become obvious in Fourier space (i.e. computing on Fourier coefficients) 
and perhaps suggests that the human mind processes some equivalent of Fourier 
coefficients. 

16 In a convincing study of Ruth Crawford Seeger’s White Moon [17]. 

17 This is characteristic of DFT up to permutations: see [3], Theorem 1.11. 

18 Yust observed that conversely - by inverse DFT - the number of common tones 
between two pc-sets can be expressed as a sum of products of magnitudes of Fourier 
coefficients, pondered by cosines of the differences of phases. 

19 It has quadratic complexity, while termwise product is linear. 
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2.2 Multiplying Saliencies 

For the sake of simplicity I present computations for diatonicity only 20 , i.e. com¬ 
paring a pc-set A with various transpositions of the Diatonic D and considering 
the fifth saliency. This is the core of the present article, making sense in a unified 
way of all previous diatonicity measures. We analyse first the link between coinci¬ 
dence and saliency. Coincidence with a prototype is a variant of Honingh’s mea¬ 
sure: 1 a * 1b (fc) is a high value when A + k shares many common values with B. 
We are especially interested in the case when B is a diatonic scale, B = D or — D 
or k — D etc. 

Applying Proposition 2 yields immediately 

^4(5) x B-d{ 5) = lwC D (5) : (#) 

the product of the (diatonic) saliencies of A and -T is a Fourier coefficient of 
the coincidence function of A and the diatonic scale. Low values of the latter 
mean that bad correlation will limit the magnitude of 5), i.e. the diatonicity 
of A. Conversely, when does this coincidence function 1a * 1-d (replaced below 
by 1 a * 1 d f° r simplicity’s sake) exhibit a high diatonicity? On the left-hand 
side of equation (fl), it means simply that A is highly diatonic (large value of 
|.Fa( 5)|). On the right-hand side, it means that the coincidence function 1a * Id 

1. has at least some large values 

2. and is ‘diatonic’ (large fifth Fourier coefficient). 

In order to understand how the simple computation of saliency supersedes all 
previous notions, let us analyse this last feature, which means (in the case of 
diatonicity) being strongly 5-periodic: the prototype, the diatonic scale D, is a 
chain of fifths, meaning that D + 5 has 7 — 1 = 6 common elements with D. 21 
From this follows an automatic quasi-periodicity of 1a * Id (see Fig. 7): 



Fig. 7. Coincidence between D and A or A + 5 changes at most by 1 

Proposition 3. 

The difference between the correlations |(1 a * 1 r>)(& + 5) — (1 a * lr>)(&)| 
is either 0 or 1. 

20 It would be even simpler for chromaticity (as suggested by a reviewer) but of less 
interest for actual analysis. 

21 One can use either 5 or 7 as generator of a chain of fifths. 
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Proof. These two convolution products expressed as sums share 6 common ele¬ 
ments, plus another one than can be either 0 or 1. More precisely, setting 
D = {5m, m = 0 ... 6} for simplicity, we get 

6 5 

(1 a * ln)(k) = y] 1 A (k - 5to) = l A (k - 30) + y] l A (k - 5 m) 

m =0 m —0 

6 6 

(1 A * 1 r>)(& + 5) = ^ 1 A(k + 5 — 5m) = ^2 1a(& - 5(m - 1)) 

m=0 m=0 

5 

= l A (k + 5) + l A (k - 5to), 

m=0 

hence the two values coincide when 1a{ k + 5) = 1 a{^ — 30)(= 1a{ k + 6) 
modulo 12), and differ by one if not. 

How then can 1 a * Id (5) be as large as possible? On the one hand, the geometry 
of the diatonic itself partly ensures some periodicity of 1a * Id (Proposition 3), 
which boosts its diatonicity. How can we further increase this periodicity? 

Let for example k = 0 in the condition 1a {k + 5) = 1a (& + 6) just derived: 
we will have 1a (5) = 1a (6) when neither F nor F$ are elements of A (or both), 
for instance when A = {0,2,4,7,9,11} (appropriately chiming the first notes 
of c Do you know what if means’). But in order to enlarge the remaining sum 
1 a (0 — 5m), we will need as many elements of A as possible in the partial 
chain of fifths C D E G A B (each adds 1 to the value of the convolution product). 
This will certainly be satisfied when A features a long connected subsequence 
of the chain of fifths. 22 We have just understood, not only how the saliency 
notion includes Vierii’s definition, but also why it is superior: Vieru’s measure 
is identical for H and H" but in the latter case the elements of H are better 
huddled in the chain of fifths, providing a larger tally of large correlation values of 
the convolution product 1 h*1d (coincidence of H with the prototypical diatonic 
scale). Let us check this by computing some numerical values. Listing the values 
of the convolution products from 0 to 11 yields 

1h*1d = [6, 2,4,3,3, 5, 2, 5, 2,4,4, 2] and 1 H „ * 1 D m [3, 3,3,3, 5, 3,4,3,3,4,3, 5]. 

For tetrachords T = {0,1, 5, 6} and T' = {0, 3, 5, 8}, it is perhaps even clearer: 

1 t *1d = [2, 2, 2, 2,3, 2, 3, 2, 2, 2, 2,4] and 1 T /*1 D = [2, 2, 3,1, 4,1, 3, 2, 2, 4, 0, 4]. 

Notice in the latter case how the value 4 occurs thrice in a row (in fifth order: 
at positions 11,4,9), in agreement with the geometric constraint found above. 
Indeed the 5-saliency of T' is greater than T’s. Similarly, H is more diatonic 
than F[" because of the sequence of high values (in fifth order) ...4,5,6,5,4... 

Of course, computing these correlation vectors with the diatonic would pro¬ 
vide an effective and convincing measurement of diatonicity 23 ; but as we have 

22 But also almost connected chains , like F C G A E B. 

23 As a shrewd reviewer noticed, it would also be feasible to correlate interval profiles, 
but our aim is to find a recipe at once simple, general and efficient. 
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demonstrated, the lone and straightforward value of saliency neatly subsumes 
the whole vector. 


2.3 Inclusion and iv 

It is redundant but perhaps useful to synthesize briefly the case of the crude 
inclusion as compared to saliency in the light of the above calculations. Inclusion 
of a pc-set inside (say) a diatonic scale is indeed a coincidence measure that can 
be pinpointed as one large coefficient in 1 a * 1 -d (at least one value equal to the 
cardinality of A, some other large values according to Proposition 3). This is but 
a special case of the preceding discussion, wherein it was shown that significant 
diatonicity depends not only on the number of coincidences but also on their 
grouping, or ‘huddling’. The same goes for large values of iv^(5) (many fifths), 
which are only indicative of diatonicity when most of the fifths are neighbors 
in the chain. 24 The extremities of the smallest chain of fifths containing a given 
pc-set are of course directly related to the number of overlapping diatonic scales 
- i.e. tally of maximum values of the convolution product -, as foretold in Vieru’s 
notion of ‘rich modes’. 

2.4 Musical Examples 

To gain perspective, let us vie away from diatonicity. D. Tymoczko’s thoughtful 
analysis of Stravinsky in [11] draws interpretation of pc-sets towards specific 
classes of scales. To his credit, he acknowledges the numerous ambiguities, crit¬ 
icizes fuzziness in previous analyses and avoids dogmatic pronouncements. Still, 
dataless statistical sentences like ‘... [this] scale accounts for virtually all of the 
pitches present’ leave room for contestation (I highlighted the adjective). On the 
other hand, exact measurements of diatonicity as magnitude of Fa{ 5) - and all 
other saliencies - can be compared both within Stravinsky’s own music, as it 
varies within a single piece, and from one piece to another; furthermore, this 
objective indicator can be applied to other composers (notably Slavic) and pro¬ 
vide objective comparisons of their relative degrees of diatonicity, chromaticity, 
or octatonicity. 

The interest of such comparisons warrants general and systematic research 
that cannot be included in this short paper. Here is but a small sample. 

(1) To assess the general appreciation allowed by measurement of saliencies, I 
have compared all six saliencies (from chromaticity to whole-toneness) on several 
pieces of The Rite of Spring and, as an external reference, the Dance of the 
Firebird. The pieces are imported as MIDI files and a time-window of fixed 
width moves over it for computation of the saliencies of its pc-sets. Figure 8 
simply exhibits the mean values of these saliencies. 25 

24 The converse is not true: consider CDE which is undoubtedly diatonic though 
iv(5) = 0! 

25 It appears that there is little difference when the time-span of the window is expanded 
from 1 to 2 or even 3 s. 
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Invocation of Ancients Sacrifice Dance of Firebird 



Fig. 8. Mean values of saliencies on some Stravinsky pieces 


The figures show ambiguity in many pieces, which satisfyingly reflects the 
diversity of experts’ interpretations! However, some clear-cut features do emerge: 

1. Whole-tone character dominates The Dance of the Firebird. 

2. The very first piece of The Rite of Spring is fairly diatonic. 

3. The Dance of Spring is more clearly diatonic. 

4. The Dance of Earth is mostly whole-tonish. 

5. In other pieces, the balance (interplay?) between octatonic and diatonic is 
apparent - in line with Van der Toorn or Taruskin’s analyses (as quoted 
in [11]). 

(2) To give a feeling of the variety of these characters in the flow of the pieces, 
I provide some excerpts of saliencies as functions of time. On Fig. 9, following 
the first minute or so of the first movement of The Rite of Spring , the saliencies 
are squared (so that their sum is a constant 26 ), and thus it is easily seen which 
character predominates in a given passage. 

26 Up to the cardinality of pc-sets. On these pictures, the dotted line shows the mean 
value of a saliency and the solid line a reference value - for as, say, it is the mean 
value found for a Mozart Sonata. 
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It best to look at Fig. 9 while listening to the The Rite ’s beginning. One can 
practically see the indecisive first bars (motif X ) flash a spurt of chromaticism 
(when the Cjj interferes ca. 6 ") before settling for diatonicism (when the D is 
added to make up Y = {0,1, 2,4, 7, 9,11}). Then the chromatic fourths around 
15" boost a\\ Z — {1, 3, 6 , 8 } occurs between 36" and 40", flirting with a penta¬ 
tonic i.e. largely diatonic character; finally, the last ambivalent motif T is played 
after 1 ', a short surge of chromaticism in a ‘quartal’ episode (large G& 2 ). 



12 

10 










Fig. 9. Variations of saliencies in first minute of The Rite of Spring 


This last moment exemplifies that other segmentations could, and should, 
be applied to music as it is perceived (as opposed to the music read on the 
score), for here T is clearly perceptible against the bass, though the numerical 
computation mixed everything together. Indeed, analyzing separate instruments, 
or voices, or groups, if justified on perceptual grounds, can lead to finer analyses, 
see examples in [11,15], and would undoubtedly constitute an easy improvement 
of saliency analysis . 27 


2.5 Phase and Tonality 

The (random) colors on these pictures could be adjusted to reflect the phase 
(direction of vectors) of the Fourier coefficient, which reflects a generalization 
of tonality (for <25 it can be checked against the values for 12 major scales or 
triads, for a§ it would be against the two whole-tone scales, etc...). Detection of 
the character of a passage (diatonic, octatonal etc.) can be compounded by pin¬ 
pointing which (say) diatonic paradigm is involved, by computation of the phase. 


27 Hopefully more exhaustive analyses of saliency of Slavic music of early XX th century 
will soon appear, and settle once and for all the question of their octatonicity. 
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This is a simple way to detect tonality, and its generalizations (which whole-tone, 
or octatonic, scale is prevalent, etc.). More about this in [3], Chap. 6. 


2.6 Possible Applications to Dodecaphonic Music 

A hasty reasoning might conclude that the calculations above are meaningless in 
dodecaphonic music, since the Fourier coefficients of the chromatic aggregate are 
nil. It is not so. It is certainly true of Nicolai Obouhow’s “harmonie totale” 28 , 
but usually false in classical serial music when an appropriate time-span is used 
for the window of analysis, because the tone-row is often stated horizontally, 
not vertically; furthermore, at least in the second Viennese school, composition 
using the two halves (tropes) of the row are frequent. Of course a trope can be 
any hexachord, with distinctive saliencies, however (essentially this is Babbitt’s 
theorem) the saliencies of both tropes of a row are identical For instance, ana¬ 
lyzing both tropes in Alban Berg’s Lyrische Suite op. 28 and Violin Concerto 
op. 34 shows very strong diatonic components, see [3], p. 122. I fancy that this 
is a general feature of Berg’s serial music (as opposed to Webern or Schonberg, 
say) but my ongoing computations have been impeded by the lack of available 
Midi files for XX th century music. 

3 Conclusion 

From the perspective developed here, one gets a feeling that many worthy 
researchers have groped for years more or less in the same direction, feeling 
for the right definition of diatonicity without knowing exactly where it lay. Then 
came Ian Quinn, and lo! the Holy Grail was there for everyone to grasp. 

Not only does saliency pinpoint the character (or lack thereof) of a piece of 
music, the other component of the Fourier coefficients (the phase) also points its 
precise direction (the tonality, in the diatonic case). 

Precise measurements can, at long last, supersede empirical (at best, with 
bevies of bored and fallible test subjects) or completely subjective (at worst, and 
all the more virulent for it) evaluations. 

Moreover, this kind of analysis is valid for a huge repertoire, since all that 
was said here mostly for the diatonic character stands just as well for the 5 
other characters. It is hoped that saliency diagrams, pictures and movies will 
be developed for many pieces of music in the very near future. Indeed, it is 
only a slight exaggeration to fancy deaf people enabled at last to appreciate 
music, simply by looking at ‘Fourier clocks’ ticking as the Fourier coefficients 
vary throughout a piece! 29 It is an urgent task to develop some appropriate 
software for this kind of streaming analysis, picturing the Fourier flow of music 
on the fly. 


28 His chords systematically include all twelve pcs. 

29 Technically this is true since the music can be retrieved from the data of all Fourier 
coefficients. 
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Distributions Through the DFT 
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Abstract. Pitch-class distributions are central to much of the compu¬ 
tational and psychological research on musical keys. This paper looks at 
pitch-class distributions through the DFT on pitch-class sets, drawing 
upon recent theory that has exploited this technique. Corpus-derived 
distributions consistently exhibit a prominence of three DFT compo¬ 
nents, , / 3 , and / 2 , so that we might simplify tonal relationships by 
viewing them within two- or three-dimensional phase space utilizing just 
these components. More generally, this simplification, or filtering, of dis¬ 
tributional information may be an essential feature of tonal hearing. The 
DFTs of probe-tone distributions reveal a subdominant bias imposed by 
the temporal aspect of the behavioral paradigm (as compared to corpus 
data). The phases of /s, / 3 , and also exhibit a special linear depen¬ 
dency in tonal music giving rise to the idea of a tonal index. 


Keywords: Tonality • Key finding • DFT • Phase space • Probe tone 


1 Introduction 

Few studies in music psychology have stimulated as much interest and debate 
as Carol Krumhansl and Edward Kessler’s 1982 article on tonal hierarchy [11]. 
While it is important for establishing the probe-tone technique as a behavioral 
correlate of the sense of key—and the consequent focus on pitch-class distribu¬ 
tions in research on the topic—central also to its impact, one suspects, was the 
visualization of key relationships by deriving a toroidal space from the probe- 
tone data. This two-dimensional toroidal geometry of key distances, derived 
by applying multi-dimensional scaling (MDS) algorithms to the correlations 
between pitch-class distributions, was not necessary to establishing the efficacy 
of the probe-tone method, nor was it a necessary component of the distributional 
model or subsequently developed key-finding algorithms (which use correlations 
between distributions directly, not filtered through the two-dimensional simplifi¬ 
cation of the MDS solution). Yet the validation of common habits of thought and 
language relating to musical keys, spatial metaphors of distance, direction, and 
region, and of widely used theoretical models (such as the Schoenberg-Weber 
chart of regions and the Tonnetz) kindled the imaginations of a wide range of 
subsequent researchers. 

© Springer International Publishing AG 2017 
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Spatial models raise a number of significant questions about the nature of 
musical keys, many of which have been examined in the music perception and 
cognition literature on the topic. This paper demonstrates that the discrete 
Fourier transform (DFT) on pcsets can clarify these questions and in some cases 
suggest novel solutions. 

Krumhansl ([10], pp. 99-106) noted that the spatial representation of keys in 
Krumhansl and Kessler 1982 could be reproduced, without recourse to MDS, by 
taking the third and fifth phase components of the Fourier analysis of the key 
profiles. For Krumhansl this theoretical reformulation of the space is primarily 
an expedient allowing for the plotting of various kinds of information (expert 
key assignments, distributional data in the music) in a fixed space. The practical 
problems can be overcome by clever use of computational techniques like self¬ 
organizing maps, as [12,14] have shown. But, as I will argue here, Krumhansl’s 
simplification using the DFT is of considerable theoretical interest in its own 
right, especially in light of more recent applications of this same type of space 

[2.3.26.27] . In particular, basic mathematical properties of the DFT allow us to 
draw more far-reaching conclusions about this space and its significance to the 
nature of tonality. 

Much of this research has produced different kind of pitch-class distributions 
that can be analyzed using the DFT on pitch-class vectors, as described by 

[3.13.17.25.27] . The terminology used here is taken from [25]. The entries in 
the DFT vector are referred to as “components” and denoted /o, /i, / 2 , •••• They 
are converted to polar coordinates with magnitude \f n \ and phase 0 n , but with 
phases converted to a pitch-class scale and designated Ph n = 27r(0 n )/12. 

2 Tonal Distributions 

The large body of research that has grown out of Krumhansl’s work has produced 
an abundant crop of tonal distributions. These come in two or three basic forms. 
Krumhansl and Kessler’s ([11]) original distributions are probe-tone ratings from 
human subjects. Subsequent studies, such as [5,6,20] applied the probe-tone 
technique in varying contexts, or other experimental tasks that produce compa¬ 
rable distributional data, such as the wrong-note detection technique used by 
[8] . Another method that has produced many distributions for key-finding algo¬ 
rithms (further discussed in the next section) is to derive distributions from the 
frequency of occurrence of scale degrees in a corpus. Finally, other distributions 
(e.g. in [19,21]) are created “by hand” to optimize the performance of key-finding 
algorithms. 

Figures 1, 2 and 3 plot distributional data from a variety of sources in three 
different Fourier phase spaces. (The locations of all major and minor triads and 
two diatonic scales are also given for reference.) Fig. 1 shows part of the Ph^/Ph^ 
space used by Krumhansl ([10]) and which is also the basis of Amiot ([2,3]) and 
my ([26]) continuous Tonnetz. Despite a great variety of techniques represented 
by corpus-derived distributions—using entire pieces with or without accounting 
for modulations, using just the initial or final measures, using melodies only or 
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polyphonic textures, counting pitch-classes in different ways—all are bunched 
very closely together, near the tonic triad but typically with a slightly higher 
Pft- 5 , possibly reflecting a bias towards the dominant. Temper ley’s (“CBMS”) 
and Sapp’s hand-made distributions are close enough to these, but he on the 
fringes of the pack. 
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Fig. 1 . Ph 3/5 plot of corpus distributions from a variety of sources (Yb: [24], K&H: 
[9], K-P, Essen, Temp: Kotska-Payne, Essen, and Temperley corpora from [22], BB: 
Bellman-Budge from [19], P&S: [16], A&S: [1]) 


Figure 2 plots the same data with Ph 2 replacing Ph 3 . Major-key data spreads 
out a little more in the Ph 2 dimension, but on the whole we can reach the same 
conclusions. The interchangeability of Ph 2 and Ph% relates to a basic property 
of tonality, the tonal index, explored further below. Other components do not 
provide the same kind of essential tonal information, as the Phi / 4 plot in Fig. 3 
illustrates. Major-key data are particular unfocused in the Ph 4 dimension and 
the minor-key data in the Ph 1 dimension. Even where a certain amount of 
consistency might be found, such as the minor-key profiles in the Ph 4 dimension, 
it is closer to unrelated triads like B major, B minor, and D minor. 

The probe-tone profiles are more variable, but reliably close to the corre¬ 
sponding corpus data. Figure 4 gives a variety of major-key probe-tone data 
reflecting a variety of experimental paradigms. Cuddy and Badertscher (“C&B”) 
include major-triad and major scale contexts on three levels of musical back¬ 
ground. Brown, Butler, and Jones (“BBJ”) replicate these (“triadl,” “scale!”) 
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Fig. 3. Ph 1/4 plot of corpus distributions. 


and also test contexts that reorder the tones of each (“triad2,” “scale2”). Smith 
and Schmuckler randomly generate contexts using Krumhansl and Kessler’s 
profiles weighting tones by duration (“S&S1”) or frequency (“S&S2”) at vary¬ 
ing levels. Janata et al. (“J&al”) use a very different method of wrong-note 
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Fig. 4. P/ 13/5 plot of corpus distributions and probe tone distributions from a variety 
of sources (KK: [11], C&B: [ 6 ], BBJ: [5], S&S: [20], J&al: [ 8 ]). 


detection. Despite such differences in experimental paradigm, these data very 
consistently deviate from the corpus data on the subdominant side. The dif¬ 
ference may result from the temporal aspect of the probe-tone task: listeners 
evaluate, not a note merely in the given context, but after it, and motion to 
the left in Phs (descending thirds or fifths) is much more typical of tonal music 
than to the right, particularly at endings and moments of resolution. Particu¬ 
larly striking is Brown, Butler, and Jones’s reordering of the arpeggiated triad, 
which appears to consistently imply F major more strongly than C major. 

To examine the matter more closely, let us focus on a single, fairly rich, body 
of corpus data collected by Prince and Schmuckler ([16]). Tables 1 and 2 show 
the DFTs for their data collapsed over metric position but divided by composer. 1 
These data are average tonal profiles for each composer, with all pieces trans¬ 
posed to C major or C minor, but with no accounting for modulations. The 
data for Bach, Mozart, Beethoven, and Chopin represent relatively large sam¬ 
ples (between 20,000 and 120,000 quarter notes for each data point) while those 
for Schubert, Liszt, Brahms, and Scriabin are smaller (1700-16,000). Despite 
the wide range of harmonic styles represented, one very clear conclusion can be 
drawn from the DFT magnitudes: With one exception, is always very large, 
followed by then / 2 . This agrees with results from [7] whose ic5 category may 
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Jon Prince generously shared this raw data through personal correspondence. 
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be roughly equated with \f$\ through Quinn’s [17] “intervallic half-truth.” 2 The 
last three components are negligibly small, in most cases less than 1% of the 
total “amplitude” of the distribution. The one exception is Liszt-major, which 
differs from all the other distributions in that fs and are equally prominent. 
While this may point to something special in Lizst’s harmonic style, we should 
not make too much of this distribution, since it represents only three pieces 
(Grande etude de Paganini 4, Liebestraume 3, and Transcendental Etude 5). 3 
One other discernable stylistic difference is the greater emphasis on diatonicity 
(/s) in Bach versus all later composers. This is particularly pronounced in the 
minor mode, where later composers typically put more weight on /2 and fs at 
the expense of f$. 


Table 1 . DFTs of corpus data from Prince and Schmuckler [16], for major keys. 
Squared magnitudes are multiplied by 10 4 . 


Composer 

1/1I 2 

1/21 2 

1/31 2 

1/4I 2 

to 

i/e 1 2 

Phi 

P/12 

Ph 3 

P/14 

Ph 5 

Ph 6 

Bach 

2 

79 

150 

10 

2095 

2 

9.89 

0.96 

0.38 

3.74 

1.96 

6 

Mozart 

9 

243 

310 

1 

2158 

4 

9.34 

11.91 

0.96 

8.04 

1.66 

0 

Beethoven 

4 

182 

287 

7 

1427 

0 

8.31 

11.53 

1.34 

6.43 

1.53 

6 

Schubert 

7 

127 

337 

18 

1931 

0 

8.25 

0.18 

1.06 

7.70 

1.65 

0 

Chopin 

10 

186 

357 

11 

1638 

1 

7.43 

11.72 

1.03 

8.22 

1.26 

0 

Brahms 

2 

49 

224 

5 

1009 

2 

7.74 

0.27 

0.68 

9.37 

1.59 

0 

Liszt 

5 

85 

402 

30 

394 

43 

6.21 

0.12 

0.68 

9.37 

1.59 

0 

Scriabin 

11 

117 

352 

47 

2154 

2 

8.93 

1.09 

0.89 

7.12 

1.93 

6 


The generally low magnitudes of /1, /4, and explain another feature of 
the distributions: the lack of consistency in phases for these components. Phase 
values should become more volatile as the magnitudes approach zero where the 
phase becomes undefined. However, it is logically possible to expect variability in 
phase values for the well-represented components (/2, /3, and /s). Such variation 
could reflect real differences between composers, who represent a wide range of 
styles. On the whole, however, we do not see much variability in these phases, 
and the data, as in Fig. 1, tends to cluster close to the P/12, P/i3, and P/15 values 

2 There is also agreement on the minor-key data which shows smaller / 5 s and corre¬ 
spondingly fewer ic5-category designations. The DFT data is less equivocal on the 
secondary features of tonality, however, which clearly relate to and / 2 . This sur¬ 
faces in Honingh and Bod’s results in the form of ic3- or ic4-category pcsets, but it 
is hard to draw as clear-cut a conclusion from this aspect of their results. 

3 With the assistance of Matthew Chiu, I have recently assembled a larger data set of 
distributions that confirms these conclusions, including the pronounced low diatonic¬ 
ity of Liszt’s music, especially in the minor mode where it reaches a level approxi¬ 
mately equal to that of f%. The tendency can also be seen in Wagner and Scriabin, 
but not quite as strongly as in Liszt. 
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Table 2. DFTs of corpus data from Prince and Schmuckler [16], for minor keys. 
Squared magnitudes are multiplied by 10 4 . 


Composer 

1/1I 2 

1/2I 2 

i / 3 | 2 

1/41 2 

1/51 2 

i/e 1 2 

P/ii 

Ph 2 

Ph 3 

P /14 

Ph 5 

Phe 

Bach 

8 

69 

208 

49 

1489 

3 

8.72 

10.80 

2.29 

2.60 

11.60 

6 

Mozart 

17 

195 

656 

10 

1515 

6 

8.97 

10.53 

2.24 

0.15 

11.91 

6 

Beethoven 

2 

239 

457 

5 

1150 

16 

6.33 

10.18 

2.51 

1.73 

11.70 

6 

Schubert 

9 

238 

540 

14 

1815 

1 

9.20 

10.47 

2.62 

11.76 

11.93 

0 

Chopin 

0 

149 

336 

6 

1002 

6 

6.57 

10.36 

2.39 

2.21 

11.88 

6 

Brahms 

9 

194 

390 

2 

847 

19 

8.23 

9.89 

1.87 

2.14 

11.68 

6 

Liszt 

0 

254 

651 

1 

1179 

14 

4.83 

10.75 

1.75 

1.30 

0.40 

6 

Scriabin 

5 

237 

399 

48 

1524 

19 

10.10 

10.88 

1.25 

0.61 

11.91 

0 


for the tonic triads, as can be seen in Fig. 5. The only stylistic differences evident 
here are the greater tendency of Bach’s distributions toward the diatonic scales, 
especially in major, and the opposite tendency of Liszt’s distributions toward 
the parallel keys. 

This striking result suggests a signal processing analogy to explain tonality as 
a kind of band-pass filter for pitch-class information. The tonal filter supresses 
certain frequencies (/i, /4, and /e) while amplifying others (/2, /3, and f$). 
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Fig. 5. Ph 3/5 plot of corpus distributions from Prince and Schmuckler [16]. 
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As a definition of tonality, this has the advantage that it can be treated either 
as a property of music or as a way of hearing or interpreting music. That is, 
to the extent that music is tonal, it will tend to feature harmonic content that 
emphasizes f 2 , /3, and , and tonal interpretations of music are those that filter 
out /i, /4, and /6, possibly with disregard for a prominent status for one of those 
components. For instance, octatonic music (such as certain pieces by Messiaen) 
will have a prominent /4, but a tonal interpretation of octatonic music will 
suppress this feature in order to amplify / 2 , /3, and /s, which may be controlled 
by choice of subsets or emphasized notes within the given octatonic context. This 
means that a three-dimensional phase space, Ph 2 / 3 / 5 , may be a sufficient and 
more stable tonal state space than the original 12-dimensional space of pitch- 
class distributions, since each key occupies a distinct region of Ph 2 / 3/5-space. 
However, we have also found that a two-dimensional toroidal space appears to 
be sufficient for distinguishing keys. This reflects an additional constraint that 
seems built into tonal syntax, a linear dependence between Ph 2 , P/i3, and Ph$. 
This linear constraint, Ph 2 + Ph^ — Ph$ « 0, gives rise to a “tonality index” 
that will be further discussed below. Given such a linear constraint, the three- 
dimensional space of tonality may be projected onto any of its two-dimensional 
subspaces with (ideally) no essential information loss. 

3 Key Finding 

Many studies have approached the question of key from the standpoint of artifi¬ 
cial intelligence, by developing and testing key-finding algorithms. Distributional 
approaches emerge overwhelmingly as state-of-the-art from a survey of the key¬ 
finding literature since Krumhansl and Schmuckler [10] developed the first dis¬ 
tributional algorithm. A number of similar algorithms have been proposed, with 
the major points of distinction being the use of different ground truth distrib¬ 
utions for each key and differences in how distributions are calculated for each 
piece. 

While most key-finding algorithms use correlation between profiles to deter¬ 
mine a best key, Albrecht and Shanahan ([1]) show good results for an algorithm 
that uses Euclidean distances. The Euclidean distance of two distribution is sim¬ 
ply — Vi) 2 over twelve pitch-classes with the distributions normalized 

such that J2(xi) = ^2(yi) = 1. The DFT helps illuminate the differences between 
these approaches. 

Figure 6 compares the two methods. For the distributions typical of tonal 
music, like the Bach example, they match very closely, and both reflect circle- 
of-fifths distances. (The preferred key according to both methods, G major, 
however, is incorrect for this E minor chorale.) The second example is a dis¬ 
tribution from a tonally ambiguous eight-measure theme that suggests both A 
minor and E minor. The tonal ambiguity is reflected by the similar scores for 
these two keys, but still the two methods, Euclidean and correlational, give very 
similar results. 

Both methods may be better understood through basic Fourier theorems. 
Euclidean distances remain Euclidean distances after the DFT, measured in a 
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Fig. 6. Correlations and Euclidean distances comparing distributions from two tonal 
pieces to key profiles from Albrecht and Shanahan [1]. 


direct product of complex planes and scaled by 1 / Vl 2 , by the unitarity principle 
(i.e., orthogonality). Furthermore, the convolution theorem says that correlations 
become dot products after the DFT: * g) = P(f) ■ When the magni¬ 

tudes of DFT components match—e.g. when comparing two tonal distributions 
with large/ 2 , fs , and —both measures will reflect the phase differences of the 
prominent components, and therefore they will tend to agree, the only difference 
being that correlation will be even more strongly biased towards the components 
that are large in both distributions (and hence will favor more strongly when 
comparing tonal distributions). Therefore, a simple explanation of how distri¬ 
butional key finding works is that the scale is selected by Ph$ and the mode 
by Phs or Ph^- The same results could therefore be derived from proximity in 
P/ 13 / 5 -space. 

When distributions emphasize different periodicities, particularly where a 
DFT component is large in one distribution and close to zero in the other, the two 
methods respond differently. Correlation will simply supress such components 
(since the influence of a component is weighted by a product of magnitudes). 
The Euclidean measure will include a constant value that is uninfluenced by 
changes of phase (i.e., transposition). Therefore the range of Euclidean distances 
will contract more noticeably when comparing non-tonal distributions to the 24 
key profiles, but the difference should not usually affect the choice of best key. 

Example 7 shows two instances where the two methods do choose different 
keys, the first eight measures of Schubert’s song “Dass sie hier gewesen” and the 
subject of the Ffl minor fugue from Bach’s WTC I. Both distributions suppress 
/2 and fs in favor of some non-tonal component, in Schubert’s case (because 
of a heavily emphasized vii o7 /ii chord) and /1 in Bach’s (because the subject is 
very chromatic and restricted in range). As a result, all components except for f 5 
are effectively canceled out for both the correlational and Euclidean criteria. In 
this situation, correlation is biased towards major keys, because the major-key 
distribution has a slightly higher |/ 5 |. Euclidean distance chooses the key that is 
closest in Phs, which happens to be minor in both instances, whereas correlation 
chooses the closest major key. As a result, correlation selects the correct key for 
Schubert (C major as opposed to G minor) but Euclidean distance selects the 
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correct key for Bach (FjJ minor rather than E major). The bias of correlation 
towards the major mode is a likely explanation of the Albrecht and Shanahan’s 
finding that an algorithm using Euclidean distances performs considerably better 
than others on minor-mode pieces, but somewhat worse on the major mode 
(Fig. 7). 



Fig. 7 . Correlations and Euclidean distances comparing distributions from two tonal 
pieces to key profiles from Albrecht and Shanahan [1]. 


Studies on human subjects have been much more attentive to the influence of 
temporal ordering on perceptions of key than the key-finding literature. Since dis¬ 
tributions collapse the temporal dimension, they implicitly assume that the tem¬ 
poral order of pitch-classes does not influence the sense of key, even though exper¬ 
imental studies such as [4,5,15] have amply demonstrated the importance of 
temporal order to key inferences. Approaches to key finding that deal with mod¬ 
ulation by using windowed analysis, such as Temperley’s ([ 21 , 22 ]) and Sapp’s 
([19]), may partially address this concern. But these only allow for the sense 
of key to change over time; they do not propose means by which the temporal 
ordering of pitch classes may influence the sense of key beyond assuming that 
more recently occuring pitches will have a stronger influence. More promising is 
Quinn’s [18] approach of treating progressions as the basic elements of tonality 
rather than chords (built upon by White [23]). The tonal filter may provide a 
way of “fuzzying” the concept of chord, with progressions as characteristic kinds 
of motions in Ph 2 / 3/5 space. 

4 The Tonal Index 

We have observed that typical distributions in tonal music feature three promi¬ 
nent components, f 2 , / 3 , and /s, but also that a two-dimensional space using 
any selection from Ph 2 , P/ 13 , and P/ 15 , is sufficient to represent the tonal impli¬ 
cations of a particular distribution. The reason is that typical tonal distributions 
seem to be constrained to keep the quantity Ph 2 + Phs — Phr> : the tonal index , 
close to zero. Figure 8 provides an example of how the tonal index tends to stay 
very consistently close to zero in the windowed analysis of a tonal piece. 
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Fig. 8. P /12 (Green), P /13 (Blue), and P /15 (Pink) and the tonal index (blue, lower 
graphs) in a windowed analysis of Corelli’s Violin Sonata Op. 5/1 mvt. 2, aligned with 
a harmonic summary of the score. (Color figure online) 


The tonal index is equal to zero for certain basic, mode-neutral, pitch-class 
sets: unisons, perfect fifths, and diatonic scales. This is related to the mathemat¬ 
ical fact that, for generated collections, an index of this type can only take two 
values, 0 or 6. 4 For major and minor triads, it is small, ±0.62. The non-composer- 
specific distributions in Figs. 1, 2, 3 range from —0.20 to —1.04 for major and 
0.60 to 1.05 for minor averaging —0.67 and 0.84. 5 The Prince/Schmuckler data 
of Fig. 5 gives averages of —0.51 and 0.41 and some evidence of historical trends. 
In major, the index for composers up to Brahms ranges just from —0.79 to 
—0.42 averaging —0.64, very close to the major triad value. The late tonal 
styles of Liszt and Scriabin give values much higher and closer to zero, 0.14 
and 0.05. In the minor mode, Bach stands out somewhat with an index of 0.49, 
late-eighteenth/early-nineteenth century composers range from 0.85 to 1.16, and 


4 This is a consequence of Amiot’s [3] Proposition 4.3, which also appears in [25] but 
missing a ±, crucial for the recognition of 6 (or more generally 7r) as a possible 
value. This can also be extended to other inversionally symmetrical collections using 
Proposition 6.8 from [3]. Thanks to Emmanuel Amiot for these observations. 

5 The averaging is done in the complex plane on normalized values. 
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Brahms seems to group with Liszt and Scriabin with indexes again close to zero: 

0.08, 0.11, 0.23. This suggests that the late tonal style may be characterized by 

the attenuation of this aspect of the major-minor distinction. 
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Abstract. We present the notion of abstract gestures and show how 
it encompasses Mazzola’s notions of gestures on topological spaces and 
topological categories, the notion of diagrams in categories, and our 
notion of gestures on locales. A relation to formulas is also discussed. 


1 Introduction 

Soon after the accomplishment of the first version of his The Topos of Music [9] , 
an enterprise that achieved a topos-theoretic based framework for musicology 
(a theory of performance included), and that gave a very complete account of 
the mathematical structures present in music, Mazzola became aware of that 
his own activity as a free jazz pianist had little to do with the structures and 
procedures described in his monograph. Gestures , rather than formulas, were 
the essence of his performance. Certainly, improvisation in free jazz is mainly 
determined by the movements of the body’s limbs, that is, by a dancing of the 
body , the classical structures of western music being secondary and auxiliary. 
Then a rigorous reflection on gestures is necessary, and not only in the case of 
musical improvisation, but in music in general, since all its power and intensity 
relies on its realization in bodily terms, even in the western classical tradition. 

The point of departure towards a formal definition of gesture is the one given by 
Hugues de Saint-Victor in the chapter XII of his De Institutione Novitiorum [12]: 

Gestus est motus et figuratio membrorum corporis, ad omnen agendi 
et habendi modum. 

[Gesture is the movement and configuration of the body’s limbs, towards 
all an action and having a modality.] 1 

Based on this definition, Mazzola gives the first mathematical definition of a 
gesture as a diagram of curves in a topological space (see the Sect. 3 for the 
precise definition); here the diagram corresponds to the configuration of the 
body’s limbs and the topological space corresponds to the space-time where the 
movement occurs. Further, this definition is generalized to topological categories 
in [11] to include both algebraic and topological information in gestures, and then 


1 Our translation. 
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to locales in [1] as a first step to define gestures on generalized notions of space. 
These different instances of defining gestures belong, though not so strictly, to 
the topological branch of the theory of gestures. 

On the other hand, there is an algebraic counterpart of this. In [10, p. 39], 
Mazzola defines a formula in a spectroid 2 as a suitable diagram in this particular 
kind of linear category, which is the starting point to develop a mathematical 
framework for both the theory of nets and Lewin’s transformational theory. 

It is important to stress that all these different definitions rely on the notion 
of digraph: both gestures and formulas are morphisms of digraphs with domain 
a given skeleton. Moreover, following Mazzola’s ideas, these instances can be 
regarded as attempts of reanimation of the implicit movement that the draw¬ 
ing of a digraph by means of arrows and nodes suggest. In Mazzola’s own 
words [10, p. 25]: 

The gesture is a morphism, where the linkage is a real movement and not 

only a symbolic arrow without bridging substance. 

Regarding these two branches, there are two main problems. The first one 
deals with the search for a common universe: that is, the diamond conjecture. 
The second one corresponds to a gestural representation of categories in which 
composition of arrows can be manipulated at the level of gestural intuitions, 
in much the same way as the Yoneda embedding allows the representation of 
categories in topoi of presheaves. To a great extent, topological categories were 
introduced in gesture theory so as to construct a bicategory of gestures proposed 
by Mazzola as a first step to solve these two problems. 

It is remarkable that gestures and formulas are at the core of the relation 
between mathematics and music. Mazzola has proposed a fundamental concep¬ 
tual adjunction 

music 

formulas < : > gestures , 

mathematics 

where the arrows correspond to the activities of the disciplines: mathematicians 
take gestures (intuitions, mental movements, analogies with reality,...) to produce 
formulas, musicians take formulas (scores, diagrams, musical notations,...) to 
produce gestures. The term adjunction refers to a relation that is more profound 
than a mere inversion or isomorphism, it corresponds to a true dialectic that is 
grasped formidably by the categorical concept of adjunction between functors. 
Certainly the diamond conjecture is the search for such an adjunction in precise 
mathematical terms. 

This article is an overview of a general framework for gesture theory that 
could unify the definitions of gestures on several notions of space (more related to 
the topological branch of mathematical music theory) and the notions of formulas 
in spectroids and of diagrams in categories (more related to the algebraic branch 


2 See Sect. 7 or [4, p.29] for the definition of spectroid. Spectroids were introduced by 
Pierre Gabriel in representation theory of quivers or digraphs; details can be found 
in [4], 
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of mathematical music theory). In addition, this framework is flexible enough 
to introduce gestural ideas in other fields of mathematics given its category- 
theoretic nature. 

The structure of this article is that of a theme with variations. We first present 
the notion of abstract gestures and then proceed to unfold different realizations 
thereof. Justifications for all statements that are not proved in this article will 
be found in [2]. 

2 Abstract Gestures 

Before giving the definition of gestures we need some basic definitions and fix 
the notation. 

Directed graphs and internal digraphs 

Let G\ be the category with two parallel arrows between two vertices [0], [1] plus 
the identities; it can be depicted as follows: 

id (^ [0] [1] id . 

A directed graph (or digraph , for short) is a tuple r = (A, V,£, /i), where 
A, V are sets and t,h : A —» V are functions. Digraphs correspond bijec- 
tively to presheaves on the category G\ so from now on we identify a digraph 
r = (A, V, t , h) with its associated presheaf r : G —> Set defined by 
T([l]) = A,T([0]) = V,r(eo) = £,T(ei) = h. In this way, there is a topos of 
digraphs, namely the Grothendieck topos 3 

ryop 

Digraph := Set 1 . 

Thus, a morphism from = (Ai, Vi, G,/ii) to F^ = (A 2 , V 2 ,£ 2 , ^ 2 ) (that is, a 
natural transformation) corresponds to a pair of functions ( u , v), with u : A\ —» 
A 2 and v : V\ —» V 2 , satisfying the identities 

vt\ = t 2 u, vh\ = h 2 U. 

Similarly, if ^ is an arbitrary category, a functor S : G^ can be 
identified with a tuple (Si, S 2 , eo, ei), that is, with the diagram 

eo 

of morphism of ^ by putting Si = S([l]), So = S([0]), eo = S(eo), e\ = S(ei). A 
tuple (Si, S 2 , eo, ei), where eo,ei : Si —> So are morphisms of ^ is called an 
internal digraph in . In this way, functors S : G°± —> ^ can be identified with 
internal digraphs in *0?. 

3 Any category of presheaves on a small category is a Grothendieck topos. In fact, 
given a category of presheaves on a small category it is a category of sheaves if 
we consider on ^ the trivial topology , whose unique covering sieve for each object of 
^ is the maximal sieve. 
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The category of elements 

Given a presheaf P : c ^ op —> Set on a category the category of elements of 
r , denoted by f T is defined as follows. Its objects are pairs (C,p) where C is an 
object of ^ and p E P{C ), and a morphism from (C' ,p') to (C,p) is a morphism 
u : C’ —» C of ^ such that P(u)(p) = p' . Also, there is a projection functor 
7 Tp : f P —> ^ sending u : (C',p') —> (C,p) to its underlying morphism 

u : C 1 — >C. 

In the case when = Gi, note that the category f T of elements of a 
digraph T = (A, V, t , h) can be identified with the category whose set of objects 
is AUV and whose morphisms are the identities and the pairs of the form (t(a), a) 
or (h(a),a) where a E A, domains and codomains being the first and second 
projections respectively. With this identification the projection f T — A G\ sends 
the vertices in V to [0], the arrows in A to [1], (t(a), a) to eo, and (h(a), a) to e\. 


2.1 Realizations 

As we will see through this article, the concept of realization of a digraph is 
closely related to that of gestures. In fact, realization and gestures are dual 
concepts of each other! (Subsect. 2.2). For simplicity, we start with realization. 

Let ^ be a category with small hom-sets, r : Gf 9 —» Set a digraph, and 
T : G\ —> ^ a functor. We define the realization of T respect to T, denoted by 
|rj t, as the colimit in of the functor 

J r Gi 


whenever it exists. 

Since r corresponds to a tuple (A, V, t, h) and T can be identified with a 
pair of morphisms ioAi : To —> T\ of the realization \r\ T is the limit of the 
following diagram in take a copy of T\ for each a E A, a copy of To for each 
x E V, a copy of i o whenever t(a) = x, and a copy of i\ whenever h(a) = x. 

If the realization \P\t exist for each digraph T, then there is a functor 

|_|t • Digraph —> 

which is left adjoint 4 * * to the functor ^(T, _) that sends each object C of ^ to 
the digraph ^(T(_), C). This means that for each digraph T and each object of 
C there is a bijection 

V(\r\ T ,C) *Digraph(r,V(TU,C)), 

natural in both arguments T and C. As we will see, this adjunction is very useful 
in the theory of gestures. 

4 See the theorem at [8, p. 47], which holds for cocomplete categories. This theorem 

remains valid if we only assume the existence of the colimits involved in the definition 

of L. 
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2.2 Definition 

Let ^ be a category with small hom-sets. Given a digraph F : G op —> Set and 
a functor S : G° p —» we define the object of of gestures with skeleton r 
respect to S', denoted by F@S, as the limit of the functor 

(J r y*.ar±v, 


whenever it exists. 

Following this definition, since r corresponds to a tuple (A,V,t,h) and S can 
be identified with an internal digraph (Si, So, eo, e i) in ^ the object of gestures 
with skeleton r respect to S is the limit of the following diagram in c €\ take a 
copy of Si for each a G A, a copy of So for each x G V, a copy of eo : Si —» So 
whenever t(a) = x, and a copy of ei : Si —» So whenever h(a) = x. 

On the other hand, note that this definition is the dual of that of realization. 
To see this, change ^ for in the definition of the realization of r respect 
to T (Subsect. 2.1). In this way, we obtain that the realization of a digraph 
respect to a functor T : Gi —> (which corresponds uniquely to a functor 

S : Gf 9 —> by applying (_) op ) is to be 

= Lim ((/r) g op ^ Aj = ms. 

So we have the following delicate and fundamental fact: 

The concept of gestures is the dual of that of realization. 

By dualizing the case of the realization functor, if the object of gestures F@S 
exists for each digraph F, then there is a functor 

_@S : Digraph 09 —> 

which is right adjoint to the functor ^(_, S) that sends each object G of ^ to 
the digraph ^(G, S(_)). This means that for each digraph F and each object G 
of ^ there is a bijection 

Digraph (F, (G, S(_))) = ‘if (G, F@S), 

natural in both arguments F and G. 

In particular, if the category ^ has a terminal object 1, then we obtain a 
bijection between the set ^(1,F@S) of points of F@S and 

Digraph{r^(l,S(_))). 

The digraph ^(1, S(_)) is called the underlying digraph of the internal 
digraph S. 


Colim 


/ 
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2.3 Hypergestures 

Let C be an object of ^ and T : G\ —» ^ a functor whose images To,Ti are 
exponentiable in ^. We define the internal digraph Sc of C respect to T as the 
composite 

r c<-> ^ 

which is, of course, a contravariant functor. In this case, given a digraph T, 
we write r@C instead of r@Sc , and call it £/&e object of gestures with skeleton 
r and body in C, whenever the limit exists. This construction implies that of 
hypergestures : if r' is another skeleton, we can construct the object r'@r@C, 
and so on, depending on the existence of suitable limits in ^. 

This construction of hypergestures is the main reason for which we have 
defined the object of gestures r@S with skeleton F respect to an internal 
digraph S. In particular, when the internal digraph is Sc we have defined the 
object of gestures with skeleton r and body in C rather than an individual ges¬ 
ture. Certainly, the key point of the construction of hypergetures is that 
is an object of ^ again so that can be regarded as a new body for gestures and 
we can iterate the construction. 


2.4 Gestures from External Digraphs 

The preceding construction of hypergestures relies on the existence of suitable 
exponentials. However, the construction of exponentials is not always available so 
we introduce the following notion of external digraph of an object. Besides, this 
construction allows to give the notion of a gesture in contrast to our preceding 
definition of the object of gestures. 

Let ^ be a category with small hom-sets, and T : G\ —* ^ a functor such 
that the realization functor |_|t exists. Then given an object C of we define 
the external digraph sc of C as the composite 


G 


g( -’ C) > Set, 


which coincides with its underlying digraph (Subsect. 2.2) since it is a functor to 
Set. Therefore, according to Subsect. 2.2, we have a bijection between the points 
of r@s c (that is, its elements) and the set 

Digraph(r,tf(T(_),C)). 

Consequently, in this case, we can define a gesture with skeleton r and body in 
C respect to the cosimplicial object T as a morphism 


S : r —> 5c 


of digraphs. In this way, the set of gestures TCDsc is completely determined by 
all the individual gestures S, in contrast to the case of the locales of gestures, 
which need not be characterized by their points (see Sect. 4). 




Abstract Gestures: A Unifying Concept in Mathematical Music Theory 189 


Note that, in turn, sc coincides with the value at C of the left adjoint to the 
realization functor (Subsect. 2.1) and hence there is a bijection 

V(\r\ T ,C) **Digraph(r,V(TU,C)). 

Thus, individual gestures with skeleton r and body in C correspond bijectively 
to morphisms from the realization \r\x to C. 

2.5 An Orientation 

Now we proceed to the study of the particular examples. The Fig. 1 offers an 
orientation for the different variations to be considered. It shows the different 
incarnations of the functors T and S used in the definition of gestures as well 
as the respective bodies of the gestures. Note that the gestures related to the 
columns 2-5 (left to right) come from internal digraphs of objects of the respec¬ 
tive categories and hence yield hypergestures. This is not the case for the gestures 
of the column 6, where S is an external digraph sj{. Despite this, as we have 
observed, it makes sense to construct individual gestures and to say that 
is the body, but in this case, the object of gestures is not enriched as in the 
preceding ones. The examples from the columns 2-6 correspond to the sections 
3-7 of this article, in order-preserving correspondence. 
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Fig. 1 . Ingredients for defining gestures in different categories. 


3 Gestures on Topological Spaces 

Let r be a digraph, X a topological space, and I = [0,1] the unit interval in 
M. In the sequel, we will denote the set of opens of the topological space X 
by 0(X). 

First, we construct the space X 1 of paths in X. In fact, the space I is an 
exponentiable object in Top by Theorem [3, 5.3]: it is a locally compact space 5 , 

5 A topological space X is said to be locally compact if for each point x G X and each 
open neighborhood U of it, there is a compact neighborhood of x contained in U . 
In the case when A is a Hausdorff space, this definition is equivalent to saying that 
each point in X has a compact neighborhood. In this way, every compact Hausdorff 
space is locally compact. 
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so 0(1) is a continuous lattice 6 by Lemma [6, VII.4.2]. Furthermore, the expo¬ 
nential X 1 is the set Top(/, X) of continuous maps from I to X endowed with 
the compact-open topology. 

The internal digraph in Top to be considered in this instance is the spatial 
digraph X of the space X. It is the tuple (X 7 , X, eo, ei), where eo and e\ are 
obtained by applying the functor X^ to the inclusions ioAi : {*} —» I of the 
endpoints. Note that X corresponds to the functor Sx defined in Subsect. 2.3. 

In this way, since the category Top of all topological spaces has all small 
limits, following the definition in Subsect. 2.3, we have the space r@X of gestures 
with skeleton r and body in X. However, in [10], Mazzola first defines a gesture 
as a diagram of curves in the topological space X, that is, a morphism of digraphs 

S : r —> X, 

where X is regarded as a digraph by forgetting the topological structure. This 
means that the spatial digraph X can be identified with its underlying digraph 
(Subsect. 2.2) since topological spaces are determined by their points. In this 
way, the elements of r@X correspond bijectively to these individual gestures S 
according to our discussion of points of objects of gestures in Subsect. 2.2. 

Example 1. Consider the case when X = M 2 . In this case, the spatial digraph 
M 2 of M 2 is the tuple 

(Top(/,M 2 ),M 2 ,e 0 ,ei), 

where eo (respectively e\) sends a continuous curve c : / —> M 2 to c(0) (respec¬ 
tively c(l)). In this way, the digraph M 2 has as arrows all continuous curves in 
M 2 and as vertices all points in M 2 . 

Now suppose that r is the digraph of the Fig. 2, that is, T = 
({a, 6}, {x, 7/}, £, h), where t(a) = h(a) = t(b ) = x and h(b) = y. Then a ges¬ 
ture 5 : r —> M 2 , which can be illustrated with the Fig. 2, is a pair (■u , v), where 
u : {a, b} —» Top(/,M 2 ) and v : {x,y} —> M 2 are functions satisfying the 
conditions u(a)(0) = u(a)(1) = u(b)( 0) = v(x) and u(b)(1) = v(y). In words, it 
is simply a diagram of curves that match according to the configuration of T. 


r 

a O x - » *y 


5 

gesture 



v(y) 


Fig. 2. A topological gesture 5. 


On the other hand, the space r@R 2 is the limit in Top of the diagram 
Tbp(J, K 2 ) AA R 2 Top(/, K 2 ) R2 . 

eo 

6 Or core-compact, according to the terminology in [3]. 
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According to the construction of limits (by means of products and equalizers) in 
Top, the space r@R 2 is the subspace of the cartesian product (equipped with 
the Tychonoff topology) 

Top(/,R 2 ) x Top(/,R 2 ) x R 2 x R 2 

consisting of all tuples (c a , q,,^,^) satisfying the conditions c a (0) = c a (l) = 
0,(0) = Px and q,(1) = p y . Note that such a tuple is essentially the same as a 
gesture S. □ 

Gestures and geometric realization 

In the case when the functor T : G\ —> Top corresponds to the pair of inclusions 
ioGi • {*} —» I of the endpoints, the realization \r\ of a digraph r respect to 
T always exists since Top is small cocomplete and is often called the geometric 
realization 7 of r. 

Example 2. Consider the digraph r of the Example 1. The geometric realization 
\r\ is the colimit in Top of the diagram 

T j ^ r 't U T U r 't 

I j {*} - > I < - {*} • 

h 

According to the construction of colimits (via coproducts and coequalizers) in 
Top, the geometric realization \r\ is the quotient of the disjoint union 

(J x {a}) U (/ x {6}) U (4 U {y} 

by the relation ^ defined by (0, a) ~ (1, a) ~ (0,6) ~ x and (1,6) ~ y. The 
resulting object is illustrated in Fig. 3. In this way, an open of the quotient 
topology on |rj corresponds to a tuple 


(U a ,U h ,V X ,Vy), 

where U a , Ub G ^(/), V x C {x}, and V y C {y} satisfying the conditions (i) 
0 G U a iff 1 e Ux iff 0 e u b iff X e v x and (ii) 1 e u b iff y G V y . □ 

|r| 

1 — <M*2/ O—* 

Fig. 3. The way of identifying the points of the disjoint union (left-hand) and the 
realization of the digraph from Fig. 2 (right-hand) 


7 This name is due to Milnor, who first studied the geometric realization in the context 
of algebraic topology, though for simplicial sets instead of digraphs. However, in [10], 
this object is called spatialization. 
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By the associated adjunction to the geometric realization (Subsect. 2.1), we 
have an isomorphism 

Top(|r|, X) = Digraph(r , X ), 

natural in both arguments T, X. Thus, a gesture with skeleton r and body in X 
is essentially a continuous map from \r\ to X; for instance, note that the gesture 
at Fig. 2 can be interpreted as a continuos map from the geometric realization 
at Fig. 3 to M 2 . Moreover one may ask whether there is a homeomorphism 

x \n ^ r@x. 

The answer is affirmative iff r is a locally finite digraph 8 , that is, iff \T\ is 
exponentiable in Top; we omit the proof here. The important point is that this 
result illustrates a basic problem in gesture theory: the reduction of objects of 
gestures defined by the procedure in Subsect. 2.3 to exponentials. It is important to 
stress that isomorphisms of the above type are not always possible; for example, if 
the digraph has infinitely many arrows with the same tail, the above isomorphism 
makes no sense. And in some respect, this is what makes topological gestures 
so interesting from a strictly mathematical viewpoint; if they were reducible to 
exponentials nothing new is to be studied. 

4 Gestures on Locales 

The category of frames , denoted by Frm has as objects the complete Hey ting 
algebras , that is, complete lattices L satisfying the infinite distributive law a A 
V sG s s = Vses a ^ s > f° r a ll a £ L and S C L. The morphisms of frames are 
the functions that preserve finite meets including 1 and arbitrary joins including 
0. In particular these functions preserve the order. The category Loc of locales 
is the opposite of Frm. The category Loc is small complete and cocomplete 
(see [13, II.3]), the terminal object 2 = {0, {*}} being the locale of opens of the 
singleton. 

Let r = (A, V, £, h) be a digraph and L a locale. As we have already noted, 
the locale 0(1) is a continuous lattice. Therefore 0(1) is exponentiable in Loc 
(Theorem [6, VII 4.11]) and we have the locale of paths in L. 

The localic digraph L of L is the tuple (L^^\L : e o,ei) where eo,ei are 
obtained by applying the functor l/—) to the endpoint inclusions @(i\) : 

2 —> ^(/) induced by their analogues in Top. Once again, L corresponds 
to the functor Sl defined in Subsect. 2.3. In this way, since Loc has all small 
limits, we have the locale of gestures with skeleton r and body in L. This 
definition coincides with that given in [1]. 

As in the case of topological spaces, there is a realization induced by the 
inclusions @(h) • 2 —> ^(7), and the realization of a digraph coincides 

with the locale of opens of the geometric realization in Top. 

8 A digraph is locally finite if each vertex is the tail or head of only finitely many 


arrows. 
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Example 3. Let r the digraph of the Example 1. The realization \r\ in Loc 
corresponds to the locale ^(|E|), whose elements were already described in the 
Example 2. □ 

Also, we have a reduction to exponentials, namely an isomorphism of locales 

L em) g* r @ Lj 

natural in L, for each locally finite digraph r. 

Locales are the objects of study of the pointless topology, an approach to 
a great extent derived from the vision of Grothendieck of the notion of topos 
as a generalization of that of topological space. Locales are in some respect 
residues of topoi, but they exemplify transparently the spatial aspect of topoi. 
In first instance, locales need not be characterized by their points, and there are 
examples (complete boolean algebras without atoms) of locales that are non¬ 
trivial and without points at all! As a collateral effect, the objects of gestures on 
these complete boolean algebras are also non-trivial and with no points. 

Example 4- Let ^(M)-,-, be the sublocale of <^(M) induced by the double negation 
nucleus. The elements of ^(M)-,-, are the opens U £ <^(M) for which Int(U ) = U. 
The locale ^(R)-,-, is a boolean algebra without atoms and hence has no points. 
In the same way, if r is any non-initial digraph, according to [1, Proposition 4], 
the space of points of is homeomorphic to the space of gestures with 

skeleton r and body in the space of points of ^(R)-,-,, but the latter is the empty 
space, and hence the space of points of E@^(R)-,^ is empty. However, it can 
be shown that ^(R)-,^ is a retract of and hence is not 

a trivial locale. In particular, if r is the digraph • —> •, the locale ^(R)^^ = 
E@<^(R)^ of paths has no points. □ 

This is a fundamental example for abstract gesture theory since it shows 
that the notion of an individual gesture is insufficient if a theory of gestures on 
generalized spaces is desired. Besides, if we want to define a correct generalization 
of gestures on locales, then it is impossible to give a satisfactory definition of 
a gesture with skeleton r and body in <^(R)_^ as a morphism of digraphs S : 

r —> <^(R)_,^ since both the locale of paths ^(R)^lp and ^(R)-,-, have no 
- y 

points—the object ^(R)_„ is not a digraph, but an internal digraph in Loc 
whose underlying digraph (Subsect. 2.2) has no vertices and no arrows! 

This generalized notion of space (locales) that is concerned with notions of 
neighborhoods and coverings rather than points should be taken into account to 
model the space-time in different ways than usual. It is absolutely legitimate to ask 
whether the euclidean models M n and their derivatives (as the interval object /), 
and even topological spaces, which are essentially characterized by their points, 
are suitable to describe processes that have to do with wraps and indecomposable 
movements that occur through non-atomic neighborhoods of space-time, as in the 


www.ebook3000.com 




194 J. S. Arias 


case of the human body (absolutely indecomposable in terms of points!) or the 
pianist’s hand 9 . Probably, it is time for a new topology, closer to Grothendieck’s 
ideas of a tame (moderate) topology and a geometry of shapes. 

5 Gestures on Topological Categories 

Topological categories are internal categories (see [8, V.7] or [5, B2.3.1] for the 
definition) in Top. Roughly speaking, this means that a topological category IK 
is a tuple (Ci, Co, e, d, c, m) with Ci, Co topological spaces of arrows and objects 
respectively and e, d, c, m continuous operations of unity, domain, codomain, 
and composition respectively. Topological categories and internal functors in 
Top (which we call topological functors) form a category denoted by Cat (Top) 
according to the notation in [5, B2.3.1]. 

Before explaining the construction of gestures, we mention two basic results 
on limits and exponentials of internal categories that we will need and whose 
justification can be found in [1]. 

Theorem 1. Let *€ be a cartesian category. If E = (£d, Eq, e', d', c', m') is an 
internal category in *€ such that Eq, Ei, and the object of composable arrows 
E 2 = Ei x e 0 Ei are exponentiable in then E is exponentiable in the category 
Cat(^) of internal categories in ^. 

Theorem 2. If ^ is a small complete category, then Cat(^) is small complete. 

Let / be the unit interval in M and I = (Ei, Em e', d', c', m') the topological 
category of the poset (/, <), that is, 

(i) (Ei,E 0 ) = ({(#, y)\ x < y in I},I); 

(ii) e' : Eq —> E\ is the diagonal, that is, e'(x) = {x,x)\ 

(iii) d',c' : E\ —» Eq are the first and second projection respectively; 

(iv) E 2 = Ex x Eo Ei = {((w,z),(x,y)) G I 2 x I 2 \ x < y = w < z}, and 
m! : E 2 —> Ei is defined by m'((y , z), (x, y)) = (x, z); and 

(v) the set E 0 = I has the usual topology on /, Ei is a subspace of I x I (prod¬ 
uct topology), and E 2 is a subspace of J 4 ; so that e' (diagonal), d',c',m' 
(projections) are continuous. 

To show that I is exponentiable in Cat (Top) we check the conditions of 
Theorem 1: in fact, Eq, Ei,E 2 are exponentiable in Top, that is locally compact, 
since they are closed subsets of some finite power of /, the latter being locally 
compact since finite products of locally compact spaces are locally compact. 

Also, we have two endpoint inclusions into I. In fact, note that the terminal 
category 1 = ({*},{*}, id, id, id,!) is the terminal object in Cat(Top). The 
internal functors a,/? : 1 —» I are defined by oo(*) = 0, A)(*) = 1, <^i(*) = 
(0,0), and /3i(*) = (1,1). 


9 I borrowed this idea from Octavio Agustin-Aquino. 
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Given a topological category IK, the corresponding internal digraph IK of IK in 
Cat (Top) is the tuple (IK 1 , IK, eo, ei), where IK 1 is the category of all topological 
functors from I to IK with its set of objects Po (that is, of topological functors) 
topologized as a subspace of Cf 1 x Cq and its set of morphisms Pi (that is, of 
natural transformations) topologized as a subspace of Po x Pq x Cf°, and 


IK 1 —-—» IK 


PI— 


T I 

i n 

G i— 

—> G(i), 


for i m 0,1. This internal digraph IK corresponds to the functor Sk defined in 
Subsect. 2.3, so since Cat(Top) is small complete by Theorem 2, for each digraph 
P, we have the topological category of gestures P@IK with skeleton P and body 
in IK. This definition is essentially the same given in [11], where applications of 
gestures on topological categories in mathematical music theory are discussed. 


Example 5. Let P be a loop digraph as in the picture 


a 


C 


•X . 


Let us make an explicit computation of the topological category P@IK for any 
topological category IK = (Ci, Co, e, d, c, m). First, note that according to the 
definition of P@K, it is the equalizer of the diagram 


IK 1 


ei 


eo 


t IK • 


Thus, F@IK can be described as follows: 

(i) Its objects are topological functors F : I —» IK, that is, pairs (Pi,Pq) G 
Cf 1 x Cg (correspondence on morphisms and on objects) satisfying the 
functor conditions and Po(0) = Po(l). In this way, the set of objects of 
P(Q)IK is equipped with the subspace topology of the Tychonoff topology on 
the product Cf 1 x Cq. Here, C f 1 and Cq are function spaces, which are 
endowed with the compact-open topology. 

(ii) A morphism from P to G, where F and G are topological functors as in 
(i), is a triple (P, G, r), where r : P —» G is a natural transformation 
such that To : Po(0) —> Go(0) and iq : Po(l) —> Go(l) are the same 
morphism. Here we regard r as a continuous map from I to C\ satisfying the 
usual natural transformation conditions. In this way, the set of morphisms 
of r@K is endowed with the subspace topology of the Tychonoff topology 
on the product 

Cf 1 X Cq X cf 1 X X C{. □ 
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6 Diagrams: Gestures on Categories 

Let Cat be the category of all small categories, which coincides with Cat (Set). 
Consider the functor T : G\ —> Cat identified with the pair of functors Fo,Fi 
from the terminal category 1 (just an object and an arrow) to the category of 
the poset {0 < 1}, where Fq(*) = 0 and F\(*) = 1 (see Fig. 1). Since Cat is 
small cocomplete (Exercise 5 in [7, p. 112]), we know that the realization |_|t 
exists according to Subsect. 2.1, but we require a more explicit presentation. 
Recall (Subsect. 2.1) that |_|t is left adjoint to the functor Cat(T, _) : Cat — > 
Digraph which is essentially the forgetful functor! But we know that it has a 
left adjoint (unique up to isomorphism), namely the free category functor Path 
(see [7, II.7]), so we can assume that |_|t = Path. 

Now Cat is cartesian closed by Theorem 1, the categories of functors being 
the exponentials, so given a category ^, we have the internal digraph S from 
Subsect. 2.3. In this way, we have the category P of gestures with skeleton F 
and body in . The interesting fact here is that the reduction to exponentials 
always holds, that is, we have an isomorphism of categories 

r ^ = ^Path(r) 

for any digraph r. Therefore, the category of gestures with skeleton r 

and body in ^ can be identified with the category of all functors from the free 
category Path(r) to . So diagrams are gestures! 

Example 6. Let r be the digraph • x A •y. Its realization in Cat is its free 
category, which is the category with just an arrow plus identities and can be 
depicted as 



Note that this category is isomorphic to the category of the poset {0 < 1}. 
Moreover, given a small category the category (%x A •y)@ho is precisely the 
category of functors from the category of the poset {0 < 1} to ^. Thus, the 
objects of {•x A •y)@ c 1 o are essentially morphisms of ^ and a morphisms of 
(•x A •y)@ c & from / : A —» B to g : C —> D is just a pair of morphisms 
( h : A —> C,k : B —> D ) such that kf = gh. That is, our category of gestures is 
the category of morphisms of Note that this also exemplifies the exponential 
reduction. □ 

7 Gestures on Linear Categories 

Let R be a commutative ring. We define an R-linear category to be a category 
with small hom-sets such that for each pair A, B of objects of the set 
of morphisms ^(A,B) is an R-module and such that for each triple A, B,C 
of objects of the composition o : ^(B, C ) x ^(A, B) —> <y#(A, C) is R- 
bilinear. Given two R-linear categories ,/#, JE, an R-linear functor from jjt to 
JA is a functor F : —> JA such that for each pair A, B of objects of ^ the 
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function F : <y/Z(A, B ) —» uU(F(A), F(B)) is an i7-homomorphism of modules. 
In this way, we have the category Cat# of all small R-linear categories and 
R-linear functors between them. On the other hand, an ideal of an ^-linear 
category consists of a family of subgroups ^(A, B) ^ */#(A, B) indexed by all 
pairs of objects of such that f G A, B) implies qfe G ^K(D.C) for all 

e G A) and g G JS((B, C). 

Now let He a commutative field. We say that a small Hlinear category 
is a spectroid if the non-invertible morphisms of form an ideal Rad(^) of 
and if distinct objects of are not isomorphic. It can be shown that the 
first requirement is equivalent to saying that the ^-algebras jj£(A, A) are local 10 
for all A G Ob{Jt). 

A construction of free ^-linear categories is possible in much the same way 
that in the case of free modules in Mod#. In fact, there is a functor R(_) : 
Cat —> Cat# which is left adjoint to the forgetful functor from Cat# to Cat. 
Given a small category the ^-linear category Rffi has as objects the objects 
of for each pair of objects A, B the set R*io(A,B) is defined to be the free 
module R?( A ’ B ) on ^(A, B ), and the composition is the linear extension of the 
composition in . We thus have the functor 

RT : G\ —> Cat Cat R , 

where T is the functor in Sect. 6; see Fig. 1 for a picture. Further, the realization 

|_|## coincides with R(_) o Path since |_|# = Path and R(_), as a left adjoint, 

preserves colimits. 

Given an R-linear category so as to construct gestures, we consider the 
external digraph of ^ (contravariant functor, Subsect. 2.4) 

n RT ~ , Cat R (_,^r) 

: G 1 -> Cat#-> Set, 

rather than an internal digraph in Cat#. So since the functor Cat#(_, jM) 
transforms colimits to limits and the functor R(_) o Path is left adjoint to the 
forgetful functor U : Cat# —» Cat —> Digraph , we have the bijections 

r@s^^ Cat#(|r|# T ,^r) = Cat R {RPath(P),Jt) A Digraph^, U(JZ)). 

Note that since right adjoint are unique up to natural isomorphism, the set 

Digraph[P , U (^#)) 

is essentially the set of gestures defined in Subsect. 2.4. Moreover, this set of ges¬ 
tures is strongly related to formulas. The difference is that formulas are defined 
for spectroids and that the arrows of the codomain of a formula are only 
allowed to be non-invertible morphisms of . A similar result should express 
formulas as gestures. The better situation would be when the functor Rad (see 
[10, p. 39]) from spectroids to digraphs has a left adjoint 11 ; in such a case, using 

10 That is, local rings: all non-invertible elements form a two-sided ideal. 

11 The author ignores whether or not such a left adjoint exists. 
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the same reasoning from above, this left adjoint could be regarded as a real¬ 
ization such that the associated set of gestures with skeleton r and body in a 
spectroid would be isomorphic (as in the above isomorphism) to 

Digraph(r , Rad( Jft)), 

that is, to the set of formulas! Now the functor R(_) o Path is a naive candidate 
for such adjoint, but the images of the functor R(_) o Path are not spectroids 
in general as discussed in the following example and hence we discard it. 

Example 7. If r is a loop (see the Example 5), then the realization RPath(r) 
is isomorphic to the polynomial ring R[x\ which is never local since 1 — x and x 
are non-invertible with 1 = 1 — x + x invertible. This shows that RPath(r) is 
not a spectroid. 

However, if R is a field fc, the quotient algebra k[x\/(x 2 ), which can be iden¬ 
tified with the algebra of dual numbers, is local with ideal of non-invertible 
elements generated by the equivalence class of x. Thus, k[x]/(x 2 ), regarded as 
the set of morphisms of a category with just an object, is an spectroid. 

In this way, a gesture with skeleton a loop and body in the linear category 
k[x\/(x 2 ) is just the choice of an equivalence class [a+bx\ in k[x]/{x 2 }. In contrast, 
a formula in the spectroid k[x\/(x 2 ) is the choice of a class of the form [bx\. For 
instance, the element [x] is a formula, which can be interpreted as the element 
x subject to the condition x 2 = 0; hence the relation with the intuitive idea of 
a formula. Finally, note that the class of the unit of k is a gesture that is not a 
formula. □ 

8 Final Comments 

Further generalization 

The formal definition of gestures in Subsect. 2.2 was deliberately chosen in this 
form to illustrate the several possibilities of generalizing it. The category G± 
can be replaced by the semi-simplicial category so that we can define gestures 
whose skeleta are semi-simplicial sets r respect to semi-simplicial objects. In 
that case we can regard digraphs as particular examples of semi-simplicial sets 
and hence the resultant theory generalizes that for digraphs. This is not only 
a mathematical fantasy; these generalization have relevant consequences in the 
theory of gestures for digraphs. For example, in the case of topological spaces, 
the space of hypergestures where T', r are locally finite digraphs and 

X is a space, satisfies 

r'@r@x = x |r/ i x i r| = xi r ' x ^i, 

where P’ x g r is the geometric product of the digraphs P’ and T, which is 
usually a semi-simplicial set rather than a digraph. This fact also exemplifies 
the combinatorial nature of topological hypergestures with locally finite skeleta : 
they basically depend on the digraphs, not on the particular space! Furthermore, 
the above formula is also valid for gestures on locales. 
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Gestures and Kan Extensions 

The formulas defining objects of gestures and realizations show that the realiza¬ 
tion functor |_|t and the gesture functor _@S are left and right Kan extensions 
respectively. In fact, note first that the category of elements f T is isomorphic 
to the comma category Y [ T and that (f r) op is isomorphic to F j Y op , where 
Y : G i —> Set 1 is the Yoneda embedding. Thus, from the definitions of 
realization and gestures in Subsects. 2.1 and 2.2, we obtain the formulas 

\r\ T = Colim (y|r4GA^) = Lan Y (T){r) 

and 

r@S = Lim (r l Y op A G° p ^ c €) = Ran Y o V {S)(T). 

This means, according to Theorem 1 in [7, X.3], that 

the realization functor |_|t is the left Kan extension ofT along the Yoneda 
embedding and, dually, the contravariant gesture functor _@S is the right 
Kan extension of S along the opposite of the Yoneda embedding. 

This fact helps to locate gesture theory as a particular case of the theory of 
Kan extensions. Then we have a notion of preservation of gestural structures as 
shown, for example, by the formula 

pt(mL) ^ m P t(L), 

which says that the space of points of the locale of gestures with skeleton r and 
body in a locale L is homeomorphic to the space of gestures with skeleton r 
and body in the space of points pt(L). Moreover, this viewpoint helps to give a 
definition of gestures, based exclusively on Kan extensions, that need not deal 
with limits, that is, there may be objects of gestures that are not pointwise Kan 
extensions 12 . 

From the diamond to a category 

It is important to make clear that we are not claiming a solution for the so- 
called ‘diamond conjecture’, instead we consider that it has not been formulated 
in a correct way yet. In this way, we hope that the piece of theory presented in 
this article is useful for giving a more theoretical shape to the diamond diagram 
[10, p. 43]. In the initial diamond 13 , the two vertices related to the category of 
gestures and the category of formulas should correspond to two particular realiza¬ 
tions of the category of digraphs (or semi-simplicial objects, if we are more risky). 

12 Though probably the more interesting objects to study are the pointwise Kan exten¬ 
sions and hence the realizations and gesture objects defined by means of (co)limits 
as above. 

13 Which was not precisely a diamond since it is noticed there that there is a possible 
framework for formulas for each field k. 
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For gestures it is done, but not for formulas though we are close. Moreover the 
particular notions of gestures can be compared since we have a notion of preser¬ 
vation of gestural structures, taken from the theory of Kan extensions. Thus we 
have a category of gestural structures which could be useful to find a precise 
adjunction between gestures and formulas, allowing us to recover the gestures 
behind formulas and the formulas behind gestures. 
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Abstract. Mathematical Music Theory (MaMuTh) can be understood 
as a creative support of the musical ontology, a toolset for composition, 
or a model for theoretical approaches. Several MaMuTh scholars who 
are also musicians have asked about the opposed possibility, a Musi¬ 
cal Math Game (MuMaGm), namely the creative musical support of 
the mathematical ontology, setting up conjectures, mathematical theo¬ 
ries and eventually helping solve mathematical problems. We discuss this 
idea and our related proposal of music and mathematics being adjoint 
functors between (the categories of) formulas and gestures. We illustrate 
this bidirectional ontological shift of creativity between music and math¬ 
ematics through the history of counterpoint. 


Keywords: Creativity • Mathematical Music Theory 
Musical mathematics game • Ontology • Counterpoint 


1 Introduction 

The recent success of Mathematical Music Theory (MaMuTh) has always been 
relativized by the failure of an “adjoint” movement that one could call Musical 
Math Game (MuMaGm), a musical movens behind mathematical theory. This 
caveat has been forwarded (orally) in different ways also by my fellow schol¬ 
ars, namely (among others) Moreno Andreatta, Emilio Lluis Puebla (“Mathe¬ 
matics and music are both fine arts.”), and Octavio Alberto Agustfn-Aquino. 
Historically, this imbalance of MaMuTh versus MuMaGm is also traced in 
Leibniz’ statement that “Musica est exercitium arithmeticae occultum nesci¬ 
ent is se numerare animi.” Accordingly, music is only a superficial activity that 
is driven and caused by a hidden mathematical machine or program. In this 
perspective, MaMuTh makes perfectly sense and MuMaGm doesn’t. 

The MuMaGm perspective is however not inexistent, at least as a philo¬ 
sophical approach. The most important philosopher of German romanticism, 
Georg Philipp Friedrich von Hardenberg, aka Novalis, in his philosophical writ¬ 
ings (Fragmente - Kapitel 10) says: “Aller Genuss ist musikalisch, mithin math- 
ematisch.” (All enjoyment is musical, consequently mathematical.) He explic¬ 
itly invokes a “musical mathematics.” The English poet and mathematician 
James Joseph Sylvester in his Philosophical Transactions called mathematics the 
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“music of reason”. Leibniz’s famous words could be counterbalanced by “Math- 
ematica est ludus musicalis animi se nescientis” (Mathematics is a musical game 
of the unconscious soul.). 

In a more mathematical perspective, Guerino Mazzola has argued [10] that 
music and mathematics could be mutually adjoint functors between gestures and 
formulas: 


music 

formulas ^ ^ gestures 

mathematics 


In this wording, mathematics would no longer be the in-depth structure 
whose surface produces music. Both fields would be different, but balanced and 
interdependent movements of human expressivity. Of course, the above setup 
is not strictly mathematical since one would have to specify the categories of 
formulas and gestures. We will make this topic more precise in Sect. 3. 

Although MuMaGm seems to share some reality, one major reason for its 
historiographic absence is that—as opposed to theories—games are rarely docu¬ 
mented if their rules are difficult to explain. The only explicit textual reference 
to such a game is, ironically, Hermann Hesse’s novel Das Glasperlenspiel , pub¬ 
lished in 1943, and describing a fictitious game of the future, where mathematics 
and music would interact as in supreme human intellectuality. We shall discuss 
a historical example of MuMaGm when tracing the development of counterpoint 
in Sect. 4. 

In this paper we want to investigate MaMuTh and its adjoint MuMaGm 
from a particularly important point of view, namely as creative processes that 
are characterized by a switch of ontologies in the following sense. Creativity 
in MaMuTh starts with a situation of the musical ontology, be it a type of 
musical structures, such as chords, intervals, motives, a task of musical compo¬ 
sition, or a question regarding sound colors, etc. One then transfers such onto¬ 
logical instances to the ontology of mathematics, thereby generating calcula¬ 
tions, formulas, theorems, or mathematical models, which then, when transfered 
back to the corresponding musical entities, generate a creative musical output. 
The essential point of this type of creativity resides in the switch of ontologies. 
This MaMuTh creativity restates musical instances in a powerful mathematical 
ontology and provides via mathematical procedures a background for the musi¬ 
cal output. In Sect. 2, we want to discuss such creative actions, one example 
from musical composition, and two examples from the prehistory and history of 
counterpoint. 

In the concluding Sect. 5, we want to present the next big step for a future 
counterpoint, a step that is enabled by theories and software for contrapun¬ 
tal composition based upon different interval concepts and also extensions to 
microtonal pitch systems. 

Despite the philosophical flavor of our paper we propose a number of oper¬ 
ational and experimental initiatives in favor of a deeper understanding of the 
creative MuMaGm switch. 
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2 Three Examples of MaMuTh Creativiy 

It is evident that our discourse will not focus on the role of music as a tool for 
daydreams, a role which is acceptable and was important for Albert Einstein’s 
creative work. For him, music was a useful tool for inspiration and intuition, 
only, not more than an ornament of psychological relaxation. Our discussion of 
creativity with music aims at an understanding of its balanced interplay with 
mathematics. Before we embark in the discussion of creativity we should recall 
the definition of a creative process that was developed in [11] and has been 
applied, among others, in the field of computational creativity research [4]. 

In [11], creativity is defined as a process that comprises seven successive steps: 

1. Exhibit an open question 

2. Define its (semiotic) context 

3. Find a core concept 

4. Describe its ‘walls’ 

5. Soften the walls 

6. Extend them 

7. Evaluate the extended concept 

This in particular describes a process of a semiotic nature. The context of 
the open question (not “problem”, but “question”, which is less restrictive) is a 
semiotic system (step 2). At the end of the creative process we have produced 
new signs that extend the system. Creativity is a strict extension of a semiotic 
system via new signs that result from extended conceptual walls (step 6). This 
is the reason why computational creativity has to solve the hard problem of 
computational semiotics as a preliminary conditio sine qua non. 

The important concept here is “wall”. Let us explain this. A concept has its 
determining attributes. They delimitate the concept from others. For example, in 
pre-Einsteinian physics, the Newtonian concept of time was a real number that 
was essentially the same for all inertial system, it was “God’s one and only time.” 
The critical wall of this concept was that it is a singular noun. Time was not 
understood as something that could have a plural. Einstein’s creative extension 
was to admit that time could have a plural case, to admit a plurality of times, 
one for each inertial system, and to describe the transformation between such 
time instances by the Lorentz transformation. The point here is to understand 
that this creative Einsteinian switch from singular to plural was difficult because 
the wall was thought to be an essential attribute of the very concept of time. 

For musical practice, the creative process looks like a loop that successively 
improves the relevant concepts, as shown in Fig. 1. We come back to this loop 
in Sect. 5. 

A last preliminary example of creativity is the following theorem from alge¬ 
braic topology, which we shall use in Sect. 3: 

Theorem 1. Every group is isomorphic to a fundamental group tti(X) of a 
topological space X . 
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Fig. 1 . The creative loop of musical performance. 


The creative point here is that the abstract concept of a group is covered by 
the concept of the homotopy class of a closed curve in X, which comprises an 
“elastic collection” of curves. It would seem that algebraic abstraction is opposed 
to the unprecise object of a homotopy class of curves. This theorem proves that 
this is wrong. So the wall “abstraction” must be opened to include fuzzy objects. 


2.1 MaMuTh for Composition: Beethoven’s “Hammerklavier 
Sonate” Revisited 

This example refers to Mazzola’s sonata op. 3 (l’essence du bleu) [ 6 ], an Allegro 
movement that was composed following a detailed analysis of the Allegro move¬ 
ment of Beethoven’s op. 106 (Hammerklavier). The wall in the musical ontology 
was the impossibility to be more creative than Beethoven in his sonata that is 
accepted as a most difficult and musically also unsurpassable creation. Mazzola’s 
challenge was to break down this wall. This was done by an ontological switch 
from music to mathematics. The mathematical analysis of the sonata’s harmonic 
and motivic architecture revealed a mathematical group, namely the symmetry 
group Symz{C#~ 7 ) of the diminished sevenths chord C # -7 = {G#, E , G, A#}, 
in the role of defining all possible tonal modulations as well as the motivic ker¬ 
nels, see [7, Ch. 28.2] for details. 

In the environment of the mathematical theory of groups, it was quite 
straightforward to envisage another group, namely the symmetry group 
Symz(C# + ) of the augmented triad G# + = {G#,F, A}. These two symme¬ 
try groups of chords are fundamentally related to the Sylow groups Z 4 and Z 3 of 
the pitch class group Z 12 Z 4 XZ 3 . It was therefore mathematically stringent to 
replace the group of op. 106 by the group Symz(C#+) with the aim of building 
a modulatory and motivic architecture in analogy to Beethoven’s architecture 
that is derived from Symz(C#~ 7 ). Therefore the ontological switch (as shown 
in Fig. 2) to mathematics enabled a musical creativity in the composition of a 
new sonata Allegro movement, Mazzola’s op. 3. This composition would have 
been psychologically and creatively impossible if starting directly from op. 106, 
this Mount Everest of sonata compositions. 
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op. 106 



math ontology 


| Sym z (C# 7 ) | 



op.3 


Fig. 2. The creative ontology switch for the composition op. 3 (l’essence du bleu) as 
derived from Beethoven’s op. 106 (Hammerklavier). 

2.2 The Pythagorean Prehistory of Counterpoint 

This example deals with the prehistory of counterpoint, namely the develop¬ 
ment of the concept of consonant or dissonant intervals. We want to include the 
time interval from Pythagoras (around 570 B.C.) to the initiation of polyphony 
around 900 A.D. We have to start at the initial musical setup which is the mono¬ 
chord, on which Pythagoras and his students were “listening to the first principles 
of the universe.” We have to be aware that they were hindered by a major wall, 
namely the wall of a total absence of acoustics. No Fourier decomposition, over¬ 
tones and the like were available 570 B.C. The ontological switch from music to 
mathematics performed by the Pythagoreans was the mathematical interpreta¬ 
tion of musical intervals as found on the monochord’s string: dividing the string 
by two produced an agreeable octave, taking two thirds of the string produced 
an agreeable fifth, and taking three quarters of the string length produced an 
equally agreeable fourth. The musical impression was therefore transfered to a 
mathematical structure, the tetractys. The tetractys was the world formula of 
ancient Greece. It was the background of all phenomena. This ‘formula’ consists 
of ten points (a holy number in ancient Greece) which are arranged in a triangle 
of one on top, then two, three, and four points stapled on four rows. The ratios 
2/1, 3/2, 4/3 of successive row point numbers were interpreted as the rational 
background for monochord sound intervals (Fig. 3). 

This initial switch from musical to mathematical ontology was the basis of the 
entire concept architecture of interval constructions. The creativity for musical 
intervals was directed by their mathematical representation. The Pythagorean 


www.ebook3000.com 


















206 


G. Mazzola 



Tetractys 


~| 2:1 

-° ^ 3:2 


o o o o 


□ 4:3 


math ontology 


interval ratios 



Fig. 3. The creative ontology switch for the development of interval categories following 
the Pythagorean prehistory. 


tuning is built upon the interval types that one may deduce from the prime 
numbers 2 and 3, which appear in the tetractys. 1 


2.3 From Palestrina-Fux to New Counterpoint Worlds 

A third example of MaMuTh creativity starts with the established dichotomy 
of consonant versus dissonant intervals by Palestrina in the 16th century and 
then canonized by Johann Joseph Fux in his famous catechism-styled Gradus 
ad parnassum [5]. The consonant intervals modulo octave are prime, minor and 
major third, fifth, minor and major sixths, denote their set by K. The other six 
intervals are dissonant, their set is denoted by D. In particular the fourth, which 
was consonant for Pythagoreans, turned out to be dissonant. We come back to 
the time interval from 900 A.D. to Palestrina’s time in Sect. 4. 

Starting from the Palestrina-Fux dichotomy K/D of consonances and dis¬ 
sonances, we are given a musical situation; this dichotomy is the basis of the 
contrapuntal theory with its five species of increasing complexity. This theory 
is the endpoint of the contrapuntal development since Fux. In fact, students 
of music still learn this model in their counterpoint courses. The wall here is 
the musicological termination and the related educational dead end. And it is 
a real dead end in the sense that the Fux model is not even critically ques¬ 
tioned. Fourths are dissonant, opposed to the Pythagorean and also the physical 
approaches, a fact that has also been discussed as an unsolved problem of music 
theory by Carl Dahlhaus [3]. Parallels of fifths are forbidden, because they are 


1 It was only in the renaissance that the Pythagorean tetractys was extended to what 
may be called Zarlino’s “pentactys”, and which added the prime number 5 as a fifth 


row. 
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boring. Parallels of thirds are less boring? Here psychology is infused into music 
theory. But the system of education simply fixes these axioms without any crit¬ 
ical analysis. The wall is this teaching of interval processes as a (historically?) 
sanctioned catechism. And, correlated to this rationale, an extension or varia¬ 
tion of the given approach is nearly impossible, except, perhaps, Charles Louis 
Seeger’s idea of a dissonant counterpoint [14], which exchanges K and D. But 
this is not a creative extension at all (Fig. 4). 


| Fux/Palestrina interval dichotomy 



musical ontology 

^3 



Music? 


five new counterpoint worlds, 
have software for composition 


K-D symmetry 
d = 5k+2 on third torus 


Math, 


math ontology 


Math? 


six strong dichotomies 
for counterpoint theories 




Fig. 4. The creative ontology switch for the Fux-Palestrina classification of intervals 
to the present five-world theory of also microtonal intervals. 


In view of this wall it was reasonable to repeat the Pythagorean MaMuTh and 
transfer the contrapuntal kernel K/D to the mathematical ontology. This has 
been realized by Mazzola’s mathematical theory of counterpoint, which now is 
described together with its generalizations in collaboration with Agustfn-Aquino 
and Julien Junod in [1]. This theory exhibits a unique autocomplementarity sym¬ 
metry A(k) = 5&+2 on the pitch class group Z 12 , exchanging the two components 
K and D. This symmetry is used to model interval successions and in particular 
implies the rule of forbidden parallels of fifth. The dichotomy K/D is also recog¬ 
nized as a geometrically distinguished dichotomy on the toroidal interpretation 
Z 12 ^ Z 4 x Z 3 of Zi 2 . Therefore the dissonant fourth is a consequence of this 
geometric fact. 

This mathematical theory exhibits five new dichotomies for new worlds of 
contrapuntal composition, including the corresponding rules of allowed interval 
successions, and is extended to arbitrary microtonal pitch class groups Z 2 n for 
n > 2 . The theory has also been implemented in Mazzola’s rubato composer 
software by Junod and is now ready for practical compositions. We come back 
to this perspective in Sect. 5. 
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3 Can We Define Math and Music as Adjoint Functors? 

The above examples all pertain to the MaMuTh switch. Before we discuss a very 
important MuMaGm switch, we should present some mathematical arguments 
for a more balanced switch dynamics between mathematics and music. 

A first argument relates to Mazzola’s diamond conjecture [8] which involves 
two types of categories, the category Formula of formulae and the category 
Gesture of gestures. The diamond conjecture argues that there should be a big 
category X and two functors <fi : Formula —» X, 7 : Gesture —> X which would 
close the pair of functors r : Digraph —> Formula, g : Digraph —> Gesture to 
a commutative diagram. Despite some progress (see [9]), this conjecture is not 
proved yet. 2 

Another argument relates to Theorem 1 which guarantees that abstract alge¬ 
bra is ‘covered’ by gestures. We want to show that this result yields hints towards 
the conjectured adjointness 

music 

formulas ^ gestures 

mathematics 


Our adjointness conjecture needs two functors, one that produces formulas 
from gestures, and one that generates gestures from formulas. We can prove that 
there is an argument for such an adjointness when thinking of gestures as being 
represented by toplogical curve structures, while formulas would be represented 
by abstract groups. 

Let us discuss in more detail how Theorem 1 is technically demonstrated. 
We shall see that from the demonstration it also follows that the functor 
formulas —► gestures is also musically meaningful. This had already been 
observed in [8, Section 6.1]. The proof of Theorem 1 for the group Z is that 
Z 7Ti(l, S' 1 ), the fundamental group of the circle S 1 . In this case the elements 
X^ n 9 / ™[ n ] the abstract group algebra CZ correspond to elements J^ n 7 n e 27rnt 
of the group algebra C7Ti(l,S 1 ), which are precisely the Fourier expressions of 
a wave of frequency one. The gestures, i.e., loops in 7Ti(l,S 1 ), are interpreted 
as partials of the Fourier representation, a thoroughly musical perspective. We 
want to show now that this musical interpretation also holds for general groups. 

The theorem we can prove is a weaker statement than adjointness, namely a 
natural transformation 

Mu2Ma : HTop(H,F(G )) Grp(7T 1 (H), G) 

2 The well-known adjoint ness 

Hom(SZ , Y) ^ Hom(Z , QY) 

of the suspension functor S that generates the Homotopy cogroup SZ from a topo¬ 
logical space Z , and the loop space functor Q that generates the Homotopy group 
QY from topogical space Y could be thought as an additional argument, but we 
refrain from this argument here. 
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with a functor F : Grp —> HTop from the category Grp of groups to the cate¬ 
gory HTop of homotopy classes of pathwise connected topological spaces. The 
fundamental group functor 7 Ti : HTop —> Grp is viewed as acting on pointed 
topological space classes, but as we suppose that such spaces are pathwise con¬ 
nected, the fundamental group is unique up to isomorphism. 

Proof. The proof of the existence of the natural transformation Mu2Ma resides 
on the functor F. If we can show that i\\ o F —>• Id G rp , the natural transforma¬ 
tion can be defined by the fundamental group functor, i.e., Mu2Ma(f : H —> 
F(G)) := ni(f) : m(H) - in(F(G)) ^ G. 

The functor F is derived from the classical construction of a topological space 
from a given group G. This one goes as follows, refer to [13, Chapter 3, Sect. 8 ] 
for a thorough presentation. We first need the construction for free groups. To 
this end, we first use the evident functor Free : Grp —> Grp that sends a group 
G to the free group Free(G) generated by the elements of G. The group G is 

p 

then recovered by the kernel diagram Ker(p) >—» Free(G) -» G induced by the 
identity on G. Given a free group Free(G ) that is generated by the set G, we 
first define the wedge space Wedge(G) := which is the coproduct of G 

copies Sg of the unit circle S' 1 , glued together at point 1 . It is then straightforward 
that 7Ti(Wedge(G)) ^ Free{G). The elements of 7Ti(1, Wedge{G )) are the loops 
at 1 , an evident generalization of the fundamental group construction for the 
free group Z = Free{ 1). The less trivial part of Theorem 1 is the management 
of the kernel Ker(p). 

To this end, one uses the method of adjoining 2 -cells. One first defines a 
continuous map a x : S 1 —> Wedge(G ) that maps the unit circle to the loop 
which is defined by the element x of Wedge(G). One then embeds S 1 in the 
closed unit disk E 2 , whose boundary is S 1 . Finally, one takes the topological 
quotient space deduced from Wedge(G ) U II x eWedge(G) ^ x , one copy E\ of E 2 
for every element x G Wedge(G ), by the identifications of boundary elements 
of copies E% with their images via a x , and taking the coherent topology for 
the maps a x . This topology conserves the topology of the interiors E\ \ S' 1 . 
Geometrically speaking, we glue the copies E\ to the loops x of Wedge(G) 
along their boundaries S' 1 . Call this space F(G). The crucial step in this proof is 
to show that the space F{G) has in fact the fundamental group tti(F(G)) ^ G. 
More precisely, the canonical continuous map Fq : Wegde(G ) —► F{G ), when 
given the fundamental group evaluation 7Ti(Fb), yields a group homomorphism 
diagram 

7Ti (Fq) 

Ker(7n(F G )) » i: 1 {Wedge{G)) tti (F(G)) 

whose kernel is Ker(p ), the normal subgroup of 7Ti(Wedge(G)) ^ Free(G) 
defined above. This implies tti(F(G)) G, and we are done. It is straightforward 
that the space F(G) is functorial in G, being mediated by the functor Free of 
free groups. QED. 

The musical interpretation of this construction is now immediate: Coming 
back to the idea of a group algebra CG, we may interpret the circles Sg of 
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Wedge(G ) as fundamentals of different frequencies f g , one for each g. The 
monomials in CG are related to products of n^t h powers of these fundamen¬ 
tals, i.e., Y\ i e 2 ' KTli f 9 i t . However, these Fourier products are commutative, while 
this should be avoided for general groups G. We may therefore step over to non- 
commutative compositions of functions e 27m *Ad by juxtaposing these functions 
in time, unfolding their values as spirals along a time line. The relations given by 
the 2-cells would then generate a deformation of the cells’ boundaries to singular 
points, as shown in Fig. 5. 



Fig. 5. The spiral representation of fundamental group algebra elements. 


The elements Q(t) = ^2jCjFj(t) of the group algebra CG would represent 
the time deployment of chord type events, which are defined by the sound events 
of different summands Fj(t) of Q(t) at time t. 

4 The MuMaGm of Medieval Counterpoint 

In view of the previous arguments for adjunction between mathematics and 
music, we take a look at the development of counterpoint from the beginning 
with Gregorian choral around 900 A.D. to the final stage developed in the 16th 
century by Palestrina. The central result of this development is the establish¬ 
ment of a stable concept of consonances and dissonances, the Fuxian dichotomy 
K/D. Initially, the concept of consonances and dissonances was not the final 
one, fourths were consonant (recall the fourth and fifth organum : parallels of 
a very early type). The process of stabilization of basic contrapuntal concepts 
took more than 600 years. The mathematical wall was that consonant inter¬ 
vals were seen as individual entities instead of members of a set of a particular 
quality. The history of this huge field of theoretical and practical experimen¬ 
tation is in part traced in books, e.g., Ernst Apfel’s Diskant und Kontrapunkt 
des 12. bis 15 Jahrhunderts [2] and Klaus-Jurgen Sachs: Der Contrapunctus im 
14 . und 15. Jahrhundert [12]. The historiography shows an extremely complex 




Mathematical Music Theory and the Musical Math Game 211 


meandering movement, where very differing theoretical approaches, e.g. sixths 
being dissonant, were tested by a plethora of composers, refused, replaced by 
other approaches, tested, and so forth. The final result, especially the confir¬ 
mation of K/D and the forbidden parallels of fifths, was reached without an 
evident logic. But it is important to understand that the mathematically distin¬ 
guished structure K/D was reached through a complex musical process, where 
individual interval qualities were replaced by a quality of a collaborative set of 
intervals. We may view this type of musical creativity as an excellent example of 
MuMaGm, a musical game of 600 years duration (!) that eventually provided us 
with the mathematically excellent solution. The investigation of whatever logic 
was responsible for this MuMaGm is an important open research field that could 
be supported by the experts in MaMuTh in collaboration with musicologists and 
perhaps also AI simulation technology (Fig. 6). 



Fig. 6. The MuMaGm from 900 A.D. to Palestrina. 


5 The MuMaGm of Future Counterpoint Worlds 

The present state of the art of counterpoint seems to envisage a second MuMaGm 
epoch, where the ‘universe’ of microtonally extended contrapuntal worlds as 
described in [1] would be combined and tested by new compositions and eventu¬ 
ally lead to new mathematical insights that transcend our present understanding 
of the contrapuntal universe. See Fig. 7 for the entire processual display of past 
to future counterpoint. 
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Fig. 7. The processual MaMuTh and MuMaGm display of past to future counterpoint. 
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Abstract. This article introduces a method for building and studying 
various harmonic structures in the actual conceptual framework of graph 
theory. Tone-networks and chord-networks are therefore introduced in a 
generalized form, focusing on Hamiltonian graphs, iterated line graphs 
and triangles graphs and on their musical meaning. Reference examples 
as well as notable music-related Hamiltonian graphs are then presented 
underlining their relevance for composers. 
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1 Introduction 

Hamiltonian graphs, also known as Hamilton graphs are graphs possessing a 
Hamiltonian cycle, i.e. a circuit through a graph that visits each node exactly 
once. The concept is strongly related to those of Eulerian paths and cycles, 
i.e. paths and cycles which visit every edge exactly once. Hamiltonicity has been 
widely investigated and exploited in several branches of applied mathematics 
and computer science. Given that in the general case testing whether a graph 
is Hamiltonian is an NP-complete problem [14], the issue of describing efficient 
procedures for finding such graphs under specific conditions still arouses interest. 
Additionally, also the pursuit of new necessary and sufficient conditions for a 
graph to be Hamiltonian is of interest today: developments on the topic are 
widely summarized in [8-10]. 

Furthermore, it is well-known that over the last few decades a geometrical 
approach to music theory has led to several noteworthy examples of music object 
models availing of the theoretical framework of graph theory or implicitly leading 
to its representations, as for example in the works by Albini, Baroin, Bigo, 
Brower, Callender, Cohn, Douthett, Giavitto, Gollin, Lewin, O’Connell, Quinn, 
Spicher, Steinbach and Tymoczko. 

Therefore, this article has two correlated aims. On the one hand, it intends 
to define a specific and detailed paradigm within the distinctive framework of 

© Springer International Publishing AG 2017 
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graph theory in order to represent music objects related to harmony. On the 
other hand, it makes use of Hamiltonicity to model, build, study and eventually 
enumerate certain music structures from the point of view of the paradigm just 
defined. According to the Authors’ view, the overall outcome may be of interest 
not only in the context of abstract Music Theory, but also that of Composition 
and Music Analysis. 

Tone-networks and chord-networks will be therefore introduced, deepened 
and classified referring to Lewin’s Generalized Interval System (GIS). Their 
Hamiltonicity will be then studied focusing on two characteristic classes of chord- 
networks obtained from given tone-networks: iterated line graphs and the tri¬ 
angles graph, that will be defined and examined. Their musical properties will 
be then shown with the aim of generalizing some concepts introduced in [1]. In 
conclusion considerations will be made on the advantages given by approaching 
music-theoretical graphs directly from the point of view of graph theory. 

2 Tone-Networks 

Let a tone-network T(Q,P) be a simple vertex labeled graph whose vertices 
represent and are labeled as notes (pitches or pitch-classes) and whose edges 
correspond to intervals (or interval-classes). Q is the set of the vertices, H the 
set of the edges: the former, in order to not build an empty graph, must contain 
at least one element, while the connections are arbitrary. 1 

A more formal definition of a tone-network can be formulated by considering 
a Generalized Interval System and generalizing the definition presented in [1]. 

Let us recall the definition of a Generalized Interval System (GIS). A 
GIS is an ordered triple (P, /,</?), where P is a set of pitches (or pitch classes), 
the pitch set, / is an abelian group, the group of intervals, and cp is an action 
of / on P which is free and transitive. 2 

Let Q be a subset of P, Q C P, and H a subset of /, H C /: a tone-network 
T(Q, H) is a simple vertex labeled graph which has precisely one vertex for each 
of the elements of Q; moreover, an edge between two different vertices is present 
in a tone-network if, and only if, an element in H which maps one of them into 
the other, exists. 3 


1 The well-known term lattice has been deliberately avoided in favor of the more 
abstract term tone-network to define our general note-based graphs. In fact, all 
lattices are tone-networks as we defined them (in particular, the vertex-transitive 
ones) but not all tone-networks are, or can be seen as, lattices. The term chord- 
network followed accordingly. 

2 A definition of the Generalized Interval System equivalent to the one given in [15]. 

3 Although a GIS in its original formulation admits more general musical elements 
in its set, a tone-network defined as such admits only Generalized Interval Systems 
so that P is a set of pitches, pitch classes or similar one-note musical elements 
(such as for example scale degrees). This allows us to build a framework in which 
certain graphs obtained from tone-networks always represent chords or general 71- 
note musical elements (such as for example a collection of scale degrees). 
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Since a tone-network is a simple graph, the identity element has no impor¬ 
tance in defining it. Furthermore, the inverse of an element of / always connects 
the same pair of vertices. In order to avoid confusion, we will suppose that the 
identity element is thereby always in H and, if an element of I is in P, then 
its inverse is always in it too. This definition allows us to relate tone-networks 
to a mathematically clear and consistent model of pitches and intervals, show¬ 
ing some underlying properties of the graph itself. For this purpose, let us now 
classify tone-networks over their GIS. 

Given a GIS (P, /, (/?), we shall say that a tone-network T(Q, P) is: 

- (P, /, (^)-complete if Q = P and P = /; 

- (P, /, (^)-proper if P is a set of generators of / and Q = P; 

- (P, /, p)- unproper if P is a proper, finite and not empty subset of I but not 
a set of its generators and Q = P; 

- (P, /, (^-arbitrary if Q is a proper, finite and not empty subset of P (Q C P) 
and P is a not empty subset of I (P C /). 

Let us show some results, achieved by relating tone-networks to their GIS, 
that exhibit tone-network graphical characterizations and that will be important 
in order to study their Hamiltonicity. The following Proposition 1 is well-known, 
but for a more self-contained exposition we prove it in details. 

Proposition 1. If a tone-network is (P, /, p)-complete, its graph is complete 
and its order is such as |T(P, I)\ = |P| = \I\. 

Proof. To show that for any GIS (P, /, ip) it is always true that |P| = |/|, let’s 
consider po G P and define a map / : I —> P such as for any g G I f(g) = g(po). 
p is transitive then / is surjective; since p is free then / is injective: therefore 
|P| = \I\ for any GIS. Since the action of p is free and transitive, given two 
elements of P there always is a unique element of / which maps the first in the 
latter. Thus every vertex of T(P, I) is connected with all the other ones and the 
graph is complete. Hence |T(P, I)\ = |P|. • 

It is worth noting that, from a musical point of view, a (P, /, (^)-complete 
tone-network is the graphical representation of the GIS (P, /, p) itself. Moreover, 
if a tone-network is a complete graph, then a GIS (P, /, p) for which that tone- 
network is (P, /, (^)-complete, exists. 

Proposition 2. If a tone-network T(Q,P) is (P, /, p)-arbitrary, its graph is 
k-regular with k = \H \ {e}| = \{h G H \ h ^ e}\ if and only if for any q G Q 
and for any h G H, h(q ) G Q. 

Proof. First let’s prove that if, for any q G Q and for any h G P, h(q) G Q, 
then the graph of a (P, /, (/?)-arbitrary tone-network T(Q,P) is k- regular. Let’s 
consider qo G Q and define a map whose domain is H \ {e} and whose codomain 
is the set of edges of T(Q, H ) to which qo belongs. For any h G H\ {e}, it is then 
true that h(qo) go? so go and h(qo) are connected; the edge between them is 
written as {go,M#o)}- Let’s now define f{h) = {go,^(^o)}- Since p is free, / is 
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injective; / is also surjective because of the properties of H: in fact any edge go 
belongs to is of the type {go, ft-(go)}, or, if go = h(gi), it is of the type {gi,go}, 
and the latter is such that {gi,go} = {go, ^ _ 1 (go)}- So domain and codomain 
have the same cardinality which is the one of H \ {e}, and what has been proved 
is true for any g 0 G Q: hence the graph is k- regular and k = \H \ {e}|. 

Let’s now prove by contradiction that if T(Q, H ) is /^-regular with k = \H\ {e}\ 
then, for any q G Q and for any h G H, h(q) G Q. Suppose there are go G Q and 
ho G H \ {e} such as ho (go) ^ Q • How many edges go belongs to? All of them 
must be of the type {go,h(go)| with h G H \ {e}. So they are no more than 
|H \ {e}| — 1, since ho (go) ^ Q. This contradicts the supposition that the graph 
is regular with valency |H \ {e}|. • 

Note that the only hypothesis that a tone-network T(Q,H) is regular does 
not imply that, for any q G Q and for any h G h, h(g) G Q as well. In 
effect, a counterexample can be the following: let us consider a T(Q,H) reg¬ 
ular tone-network built over the GIS of the twelve equally tempered pitch 
classes ({C, C#,..., B }, I = TLjYXL, p) such that Q = {C, D , E , P, G, A , B} and 
H = { 0 ,1, 2 ,10,11}. Although its graph is 2 -regular, it is evident that for exam¬ 
ple 1(G) = G# i Q. 

Proposition 3. If a tone-network T(Q,H ) is (P, /, p)-proper or is (P, /,</?)- 
unproper, its graph is k-regular with k = \H\ — 1. 

Proof. Since for (P, /, (^)-proper and for (P, /, (^)-unproper tone-networks Q = 
P, it is always true that for any q G Q and for any h G H, h(g) G Q. • 

In addition, being that a Cayley graph 4 X(G, S ) is the graph with vertex 
set G and edge set {gh : hg~ x G S'}, where G is a group and S a subset of G 
that is closed under taking inverses and does not contain the identity, we can 
thus show the following. 

Proposition 4. All the (P, /, p)-unproper, (P, /, p) -proper and (P, I,p) - 
complete tone-networks are isomorphic to Cayley graphs. 

Proof. Given a GIS (P, /, p), H C I and H* = H \ {e}, we will show that 
the tone-network T(P, H) is isomorphic to the Cayley graph X(I,H*). Since 
all (P, /, p)- unproper, (P, /, (/?)-proper and (P, /, (/?)-complete tone-networks are 
different instances of T(P, H), this will prove the Proposition. 

Let’s consider v G P and define a map / : I —> P such as for any g G / 
f(g) = g(v). Since the action ip is transitive, / is surjective; since ip is free, / 
is injective: therefore / is a bijection from the vertex set of the Cayley graph 
X(/, LP) to the one of the tone-network T(P,H). 

Let’s now show that / preserves adjacency. Let gh be an edge in X(I,H*): we 
need to show that g(v) and h(v) are connected in T(P,H). In fact g(v) ^ h(v ), 
because the identity is not in iP, and {hg~ 1 )(g(v)) = h(v ), hence there is l G H 

4 Proper directed Cayley graphs are also known as Cayley graphs. The definition we 
present of undirected Cayley graphs is the one offered in [7]. 
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such as l(g(v)) = h(v). Vice versa let’s suppose there is an edge in T(P,H ) 
connecting g(v) and h(y). We know that g(v) 7 ^ h(y) and also that there is 
k £ H such as k(g(v)) = h(y)\ k cannot be the identity element, so k G H*. 
Furthermore hg~ 1 (g{v)) = h(v), and, since ip is free, k = hg~ x . Hence k G H* 
and gh is an edge of the Cayley graph V(J, H*). • 

A Cayley graph X(G,S) is connected if, and only if, S' is a set of gen¬ 
erators of G, cf. [7]. Furthermore it is well-known that the complement of a 
disconnected graph is connected. Thus, the following corollaries derive directly 
from the definitions of (P, /, ip)- complete, (P, /, (^)-proper and (P, /, ip )-unproper 
tone-networks. 

Corollary 5. All the (P, / , ip)-proper and (P, /, ip)-complete tone-networks 
T(Q,H) are connected. 

Corollary 6. All the (P, I , ip)-unproper tone-networks T(Q,H) are discon¬ 
nected and their complements are (P, /, ip) -proper. 5 6 

3 Chord-Networks 

Likewise we define a chord-network T(C, R) as a simple vertex labeled graph 
whose vertices represent and are labeled as chords (ordered or unordered set of 
pitches or pitch-classes) and whose edges correspond to chord transformations. 

Note that a chord-network, as it has just been defined, can potentially include 
sets of notes of different cardinalities, representing and mapping transformations 
between chords of different sizes. Chord-networks can then be built from scratch 
or, more interestingly, they can be derived from tone-networks under some cer¬ 
tain kinds of graph duality or any type of construction. In this paper, two build¬ 
ing methods of chord-networks will be introduced: chord-networks as iterated 
line graphs and as triangles graphs of tone-networks. 

Given a graph W, its line graph L{W) is a new simple undirected graph 
such that each vertex of L(W) is an edge of W, and where two vertices of 
L(W) are adjacent if, and only if, the corresponding edges are incident in W. 
Thus L(T(Q, iJ)), the line graph of a tone-network, is a labeled chord-graph 
whose vertices are all the unordered bichords comprising connected notes in 
T(Q,P) so that two bichords are adjacent in L(T(Q, H)) if, and only if, they 
share an element of Q. Line graphs present some peculiar properties that make 
them easy to handle. First of all, it is always easy to determine the number 
of vertices v L ( W ^ and edges e L ( W ) of a line graph L{W) where W consists of 
vw vertices, that have valency and ew edges. Indeed, v L ( W ) = ew and 
e L(w) = — ew + \ Determining them is easier if W is a complete graph 

LC n , because L(K n ) = J(n, 2) where J(n,k) is the Johnson graph, such that 
vj( n , 2 ) — ( 2 ) an d £j(n, 2 ) = (n — 2 ) Q), cf. [ 11 ]. Furthermore, if a graph W is 
/c-regular, then its line graph L(W) is regular as well with valency 2k — 2 , cf. [7]. 
This leads us to the following proposition, without the need of a proof. 

5 Note that the complement of a (P, /, (^)-proper tone-network is not always (P, /, <p)- 

unproper!. 
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Proposition 7. If a tone-network T(Q,H ) is (P,I , ip)-complete, (P,/,(/?)- 
proper or (P, /, ip)-unproper, its line graph is regular with valency 2k — 2, where 
k is the valency ofT(Q,H), i.e. the number of intervals in H which are different 
from the identity. 

The process of building line graphs can indeed be iterated 6 . L 2 (T(Q, H)) = 
L(L(T(Q, H))) is then the line graph of the line graph of the tone-network 
T(Q,H ), the 2 -iterated line graph. Its vertices are all the trichords (labeled 
with one of the three inversions of the trichord) which can be seen as paths of 
length 2 on T(Q, H ) that are connected on L 2 (T(Q , H)) if, and only if, they share 
an edge. Musically speaking, connections on L 2 (T(Q,H)) represent relations 
between trichords with two or all three notes in common. This graph is quite 
large and detailed if compared for example to a topological dual under some 
immersion: every possible trichord that can be built in T(Q,H) is presented in 
all the eventual inversions 7 , implying a much increased set of different chord 
transformations. Enumerating its vertices and edges is again practical, iterating 
the calculation made for L(T(Q , H)). Moreover, Proposition 7 can be generalized 
with the following one. 

Propositions. If a tone-network T(Q,H) is (P, /, p>)-complete, (P, I, im¬ 
proper or (P, I , ip)-unproper, then for any n > 0 its n-iterated line graph is 
regular with valency 2 n k — 2(2 n — 1), where k is the valency of T(Q, H). 

Proof. Let v n be the valency of the n-iterated line graph. The statement holds 
for n = 1, in fact: v\ = 2 1 k — 2(2 1 — 1) — 2k — 2. Let’s now show that if v n 
holds, then also v n +\ holds. In fact, v n +i = 2v n — 2 — 2(2 n k — 2{2 n — 1)) — 2 = 
2 ( 2 n k - 2 n+1 + 2 - 1 ) = 2 n+1 k - 2 ( 2 n+1 - 1 ). • 

Finally, the triangles graph A(G) of a given graph G is defined as the 
simple undirected graph having as vertices all the 3-cycles of G and so an edge 
between two vertices exists in A(G) if, and only if, the corresponding 3-cycles 
in G share an edge. Thus, the triangles graph of a tone-network, A(T{Q , P)), is 
such that its vertices are all the trichords (labeled with an unordered set of three 
notes) that can be formed and closed 8 in T(Q,H) and such that two vertices 
are connected by an edge if, and only if, their trichords share two notes out of 
three. Clearly, the topological dual of a tone-network T(Q , H) under a triangular 
immersion in a two-dimensional surface is a subgraph of A(T(Q, H)). 


6 This paper is limited to n-iterated line graphs with n < 3, but it is possible to extend 
the results and study the cases for n > 3. 

7 A chord can be redundantly presented in case it is a limited transposition one. 

8 L 2 (T(Q, H)) represents paths of length two on T(Q,P), while A(T(Q, H)) repre¬ 
sents 3-cycles on T(Q,H). It means that the former admits all the trichords in all 
the possible inversions that can be built on T(Q , H). In fact, it could be that some 
inversions are not possible because of missing edges/intervals. The latter consider 
only the trichords that admit all the inversions. As a matter of fact a vertex in 
A(T(Q, H)) represents all of them. 
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4 Hamiltonicity of Tone-Networks and Chord-Networks 

Tone-network and chord-network Hamiltonicity shows interesting properties. 
First of all, Hamiltonian cycles in a tone-network represent a complete and cyclic 
sequence of notes that recalls and extends the concept of a twelve-tone row. In 
fact, the Hamiltonian cycles of the complete tone-network T({C, C#,...,T>},/ = 
Z/12Z), built over the GIS of the twelve equally tempered pitch classes 
({C, C#,..., B }, / = Z/12Z, </?), are all the twelve-tone rows. Building and enu¬ 
merating twelve-tone rows - or similar sequences of notes - with specific needs 
is then simple to do just by shaping the starting tone-network. The issue itself 
of knowing if a row exists under certain conditions brings us to the problem of 
checking the existence of a Hamiltonian cycle. 

On the other hand, Hamiltonian cycles of chord-networks represent complete 
sequences through all the admitted chords which only consider certain kinds of 
transformations. For example, in [1], Hamiltonian cycles in the Chicken-Wire 
Torus 9 have been deeply studied from a theoretical point of view as well as 
from that of composition and music analysis. They represent complete sequences 
through all twenty-four major and minor triads employing neo-Riemannian PLR- 
transfor mat ions and in which each major and minor triad is used once and once 
only. “These cycles are exclusively triadic and overall completely chromatic, since 
every pitch class is used exactly six times. As stated in [1]: the succession can 
also be more or less diatonic, depending on the patterns of the transformations 
that are employed. So these classes of cycles could be a useful compositional 
device to define harmonic structures that are triadic (and in some cases locally 
diatonic) but without any real tonal center.” 10 

From a mathematical point of view, checking if a tone-network or a chord- 
network are Hamiltonian can be quite easy in some specific cases. In fact, despite 
testing whether a graph is Hamiltonian is an NP-complete problem, cf. [14], and 
although it is difficult to decide whether a graph is Hamiltonian or not in the 
general case, the pursuit of necessary and sufficient conditions for a graph to be 
Hamiltonian continues to provide new results year after year, see [8-10]. Let us 
present some of them that are useful for our purposes. 

Well-known historical results of wide application are, in cronological order, 
Dirac’s [5] and Ore’s [16] theorems. * 11 

Theorem 9 (Dirac, 1952). A graph with n > 3 vertices is Hamiltonian if each 
of its vertices has degree greater than or equal to n/2. 

Theorem 10 (Ore, 1960). A graph with n > 3 vertices is Hamiltonian if for 
every pair of non-adjacent vertices u and v it is true that d u + d v >n. 


9 A representation introduced by Douthett and Steinbach in [6]. 

10 Cf. [1]. 

11 Note that Dirac’s theorem is a corollary of Ore’s. In 1976 Bondy and Chvatal’s proved 
a more general result of which Dirac’s and Ore’s theorem are both corollaries. 
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A number of authors have independently proved the following result regard¬ 
ing Hamiltonicity of Cayley graphs on abelian 12 groups [18]. 

Proposition 11. Every connected Cayley graph on an abelian group is Hamil¬ 
tonian. 

This along with Proposition! and its corollaries leads to the following fun¬ 
damental corollary. 

Corollary 12. (P, /, ip)-complete and (P, /, p) -proper tone-networks are Hamil¬ 
tonian. 

In Sect. 6, we will need to study a Cayley graph built on the dihedral group 
£> 24 - Since D 2 a is not abelian, Proposition 11 cannot be applied. However, the 
following, shown in [12], is likewise true. 

Proposition 13. Every connected 3-regular Cayley graph on a dihedral group 
is Hamiltonian. 

In general, determining if a line graph L(W ) is Hamiltonian can be much 
simplified by knowing if W is Eulerian or Hamiltonian itself. In fact, as shown 
in [11], if W has an Euler cycle (i.e. if it is connected and has an even number 
of edges at each vertex) then its line graph L(W) is Hamiltonian. Moreover, the 
line graph of a Hamiltonian one is itself Hamiltonian. The following propositions 
are then always true. 

Proposition 14. If a tone-network T(Q,H) is Hamiltonian, then L(T(Q, H)) 
is Hamiltonian as well. 

Proposition 15. If a tone-network T(Q, H) is connected and has an even num¬ 
ber of edges at each vertex, then L(T(Q,H )) is Hamiltonian. 

Finally, the following result, shown in [7], can be useful to prove the Hamil¬ 
tonicity of certain chord-networks. 

Proposition 16. If a group acts in a freely and transitive way on the vertices 
of a graph, then that graph is a Cayley one. 

5 Twelve-Tone Rows as Hamiltonian Cycles 

Several twelve-tone rows used by composers in notable scores are limited to 
certain intervals and are characterized by a unique sonority which sometimes 
reveals a mixture of serialism and tonality. Alban Berg’s Violin Concerto (1935) 
and its system of triads or Arvo Part’s Symphony No.2 (1966) are just two of 
the many scores with these features. 

12 Lewin did not require the group of a GIS to be abelian [15], but we think that 
commutativity is a strong requirement for the intuitiveness and consistency of a 
group of interval. Nevertheless, the result of Proposition 10 seems to apply also to 
the general case: excluding K 2 all the known non-Hamiltonian vertex transitive 
graphs are not Cayley graphs and this leads to the conjecture that all Cayley graphs 
are Hamiltonian [12]. 
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Fig. 1. Twelve-tone rows from Alban Berg’s Violin Concerto (left) and from Arvo 
Part’s Symphony No.2 (right). 


As shown in Fig. 1, they are built on a limited set of intervals - the former: 
2, 3, 4 and implicitly their inverses 10, 9 and 8; the latter: 1, 2, 3 and their 
inverses 11, 10 and 9 - and they link the last tone of the row with the first 
one again with one interval of the limited set. Thus, both can be represented as 
Hamiltonian cycles of two proper tone-networks built over the GIS of the twelve 
equally tempered pitch classes ({C, C#, / == Z/12Z, ip). Clearly, both the 

rows feature also other interesting structural qualities: that of Berg’s presents 
four overlapping triads built on the four open strings of the violin, while Part’s 
is made up of one single tetrachord which has been then transposed twice. 

Composers, who want to know if such kinds of rows exist 13 under specific 
conditions and who eventually wish to enumerate and study them, could find a 
powerful tool in our theoretical framework. For example, thanks to Corollary 6, 
they can be sure that no rows can be built on unproper tone-networks. Indeed, 
both Berg’s and Part’s examples consider a tone-network T(Q,H ) where H is 
a set of generators for the group of intervals of the underlying GIS. Moreover, 
thanks to Theorem 9, they can always begin from a complete tone-network and 
make it into an arbitrary one, cutting edges under their needs while keeping 
their valency major or equal to half the total number of vertices. In this way 
they are sure they can find Hamiltonian cycles, thus rows, on it. The practical 
computation of the rows - seen as Hamiltonian cycles in suitable tone-networks - 
can be then done by making use of certain graph theory computer programs 14 . In 
fact, if answering the question ‘does a twelve-tone row under certain conditions 
exist’ is in some interesting cases easy, the same cannot be said for the issue of 
counting them. The NP-completeness of the general problem imposes the use 
of a computer to accomplish the task. At least though, the graph-theoretical 
framework reformulates the problem in a way that makes it easy and quick to 
represent it in order to implement a solution. 

6 The Tonnetz, Its Chord-Networks and Their 
Hamiltonicity 

As reference examples we present the Tonnetz, perhaps one of the older and 
most iconic music-theoretical graph - that in our theoretical framework is the 

13 This can be obviously done also in an arbitrary n- tone system. 

14 The software employed by the Authors is Groups and Graph version 3.6 by William 
Kocay. 
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tone-network Ton — T({C, C#,..., P}, {0, 3,9,4, 8, 5, 7}) built over the GIS of 
the twelve equally tempered pitch classes ({C, C#,..., P}, I = Z/12Z, y?) - and 
some derived chord-networks: T(Ton), L 2 (Ton ), A(Ton ) and its topological dual 
under its immersion in the torus, D(Ton ), the Chicken-Wire Toms by Douthett 
and Steinbach, cf. [6]. 

Since {0, 3, 9,4, 8, 5, 7} is a set of generators of Z/12Z and its vertices are all 
the elements of the underlying GIS, Ton is a ({C, C#,..., P}, I = T>/12T>,ip)- 
proper tone-network, a 6-regular graph (Propositions) with 12 vertices and 36 
edges and it is isomorphic to a connected Cayley graph (Proposition 4 and Corol¬ 
lary 5). If we had not already known Ton, we could have found all these properties 
and we could have imagined its shape without drawing a single vertex. This pro¬ 
vides substantive information for testing its Hamiltonicity and the Hamiltonicity 
of some of the chord-networks we can build from it. 

The same applies to its n-iterated line graphs. L(Ton ) is a 10-regular graph 
with 36 vertices and 180 edges. It means that there are 36 different bichords in 
the Tonnetz and 10 different transformations joining them preserving one pitch- 
class on the two. Consequently L 2 {Ton) is a quite big 18-regular graph: it has 
180 vertices that are all the possible trichords that can be built on the Tonnetz 
in all their possible inversions and it considers 1620 edges totally. 

Being a ({(7, C#,..., P}, I = Z/12Z, (/?)-proper tone-network Ton is 
Hamiltonian (Proposition 11), as are L(Ton) and L 2 (Ton) accordingly 
(Corollary 12). 

D(Ton ), whose Hamiltonicity has been already studied in [1], is a 3-regular 
connected graph with 24 vertices - all the major and minor triads - and 36 edges 
- which consider just three parsimonious voice leading transformations: the neo- 
Riemannian P ( Parallel ), R ( Relative ) and L ( Leading-Tone Exchange). In [1] 
the Hamiltonicity of D{Ton) was established by an explicit computation of all 
its Hamiltonian cycles, but it is also possible to prove it from a mathematical 
point of view. In fact, the dihedral group D 24 is the group of automorphisms of 
D(Ton) which acts freely and in a transitive way on it and P, L and R constitute 
a set of its generators. Thus, thanks to Proposition 16 and to Proposition 13, we 
can prove that D(Ton) is a Cayley graph, thus Hamiltonian. 

Finally, A(Ton) (Fig. 2) is a graph with 28 vertices, the 24 major and minor 
triads plus the 4 augmented ones, and 60 edges. It considers nine types of par¬ 
simonious voice leading transformations, the neo-Riemannian P, L and R plus 
six that map an augmented triad in a major or in a minor one preserving two 
notes. D(Ton) - the 3-regular graph that considers only major and minor triads 
and PLR-transformations - and the well-known Douthett and Steinbach’s Cube 
Dance, [6], are subgraphs of A(Ton). The former can be obtained deleting the 
four vertices labeled with the four augmented triads, the latter cutting the edges 
that represent R transformations. The Hamiltonicity of A(Ton) can be easily 
derived from the one of its subgraphs D(Ton) and Cube Dance. In fact A{Ton) 
has four more vertices than P(Ton), the four augmented triads, and since they 
are connected to major and minor triads that are connected to each other by R 
transformations and since the cycle that alternates L and R - cycle #41 in [1] - is 
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Hamiltonian in D(Ton ), then the four augmented triads lie in a Hamiltonian 
cycle in A(Ton). In an even simpler way, the Hamiltonicity of A(Ton ) can be 
deduced from the one of Cube Dance, which in turn is easy to check. In fact, in 
a cube any pair of opposite vertices is connected by a (not unique) Hamiltonian 
path. Four such paths form a Hamiltonian cycle in the Cube Dance graph, and 
therefore in A(Ton) too. 


Ab aug 



D aug 

Fig. 2. A(Ton). 


7 Conclusions 

Tone-networks and chord-networks have been introduced with the aim to offer 
a paradigm in the distinctive framework of graph theory in order to represent 
music objects related to harmony. Two new music graphs have been introduced: 
the iterated line graph and the triangles graph of a tone-network. Tone-network 
and chord-network Hamiltonicity has then been studied in the general case and 
two distinctive examples have been analyzed: twelve-tone rows and the Tonnetz 
and some of its tone-networks. Hamiltonicity has also served to show the poten¬ 
tial of the paradigm introduced as a tool not only for music theorists but also 
for composers. Future developments might try to extend the results to other 
well-known music-theoretical graphs and to other graphical properties - distinct 
from Hamiltonicity - with some musical meaning. 
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Abstract. The “rhythmic oddity property” (rop) was introduced by 
ethnomusicologist Simha Aron in the 1990s. The set of rop words is 
the set of words over the alphabet {2, 3} satisfying the rhythmic oddity 
property. It is not a subset of the set of Lyndon words, but is very closed. 
We show that there is a bijection between some necklaces and rop words. 
This leads to a formula for counting the rop words of a given length. 
We also propose a generalization of rop words over a finite alphabet 
A C {1,2, for some integer s > 2. The enumeration of these 

generalized rop words is still open. 


Keywords: Combinatoric on words • Lyndon words • Rhythmic oddity 
Music formalization 


The rhythmic oddity property was discovered by ethnomusicologist Simha 
Aron [2] in the study of Aka pygmies music. The rhythms satisfying this property 
are refinement of aksak rhythms described by C. Brailoiu [5] in 1952, and are 
also used by Turkish and Bulgarian music. They have been studied by G.T. 
Toussaint in [11] , M. Chemillier and C. Truchet in [7] and Andre Bouchet in 
[4] who gave some important characterizations. Related problems of asymmetric 
rhythms have been studied by Rachael Hall and Paul Klingsberg [8,9]. In this 
paper, we carry on these studies by showing a one to one correspondence between 
some necklaces and words satisfying the rhythmic oddity property. In the last 
section, a generalization of rop words is proposed. 

1 The Rhythmic Oddity Property 

Patterns with rhythmic oddity property are combinations of durations equal to 

2 or 3 units, such as the famous Aka pygmies rhythm 32222322222, and such 

that when placing the sequence on a circle, “one cannot break the circle into 
two parts of equal length whatever the chosen breaking point.” In other words, 
no two of the onsets of the rhythm are located diametrically opposite to each 
other on the circle. In the language of combinatorics of words, this property in 
terms of words oj over the alphabet A = {2,3} is defined as follows. The height 
h(uS) of a word u = loquji ... a; n _i of length n is by definition the sum of its 
letters h(u) = n-i u r A word cj satisfies the rhythmic oddity property 
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(rop) if h(u) is even and no cyclic shift of u can be factorized into two words 
uv such that h(u) = h(v). For short, we call rop word a word over the alphabet 
{2,3} satisfying the rhythmic oddity property. For instance, the word 32322 of 
height 12 is a rop word, as well as all words of the form 32 n 32 n+1 for all non¬ 
negative integers n (the notation 2 n means the letter 2 is repeated n times). 
The properties of rop words have been outline in [7]. The set of rop words is 
not a subset of the set of Lyndon words: the words 222 and 233233233 are rop 
words, but not Lyndon words. A Lyndon word (see for example textbooks [10] 
or [3]) is a string that is strictly smaller in lexicographic order than all of its 
rotations. Conversely, the set of Lyndon words is not included in the set of rop 
words, since 2233 is a Lyndon word, but not a rop word (the words 23 and 32 
have the same height and 2332 is a rotation of 2233). A Lyndon rop word is a 
word of the monoid {2,3}* that is both a Lyndon word and a rop word. The 
aim of this paper is to count the number of Lyndon rop words and the number 
of rop words of length n. 

2 Properties of rop Words 

Let A denote a finite set of symbols. The elements of A are called letters and 
the set A is called an alphabet. A word over an alphabet A is an element of 
the free monoid A* generated by A. The identity element 5 of A * is called the 
empty word. A word uj £ *4* is written uniquely by = aaa\...a r -\ with letters 
dj £ A for j = 0,1,..., r — 1. The length of uj is r and denoted by \u\ . If w £ A* 
and a £ A is a letter, \uj\ a denotes the number of occurrences of the letter a in 
the word oj. 

M = E Ma W 

a£A 

If u € A*, and if the alphabet A is a set of integers, the height h(w) of uj = 
aoai...a r -i is the sum of its letters 

r —1 

h((jS) = ^ ^ CLj = (Iq + CL\ + • • • + CLj — 1 (2) 

3=0 

Until the last section, the alphabet will always be the set A = {2,3} of two 
letters. The cycle 6 of uj is defined by 6(e) = e and 6(aca) = o;a, for a £ {2,3}. 
The rotations of uj are the words <T(o;), for all positive integer i > 0. 

Definition 1. A word uj £ {2,3}* satisfies the rhythmic oddity property (rop) if 

(i) h(uf) is even 

(ii) no cyclic shift of uj can be factorized into two words uv such that h(u) = h(v). 

This definition excludes the trivial case of a rop word u with odd height. 
If the height of u is odd, the second condition is automatically verified. In his 
article [4] of 2010, Andre Bouchet gave the following characterization of a rop 
word. 
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Theorem 1. Let uj = uj^uji • • • ^ n -i be a word over the alphabet A = {2,3} of 
height 2 h. The word uj is a rop word if and only if the two conditions are satisfied: 

(i) The length of uj is odd, say 2^ + 1. 

(ii) The height of the prefixes of length I of the rotations S z (uj) of uj is equal to 
h — 2 or h — 1. 

The proof of this theorem is based on the two following properties. 

Proposition 1. Let uj be a rop word of length n and let i and X be two integers 
such that 0 < A < n. 

(1) If the height of the prefix of length A of the word S 1 (uj) is h — 1 or h — 2, then 

the height of the prefix o/^ +1 (o;) is h — 1 or h — 2. 

(2) The word uj has a prefix of height h — 1 or h — 2. 

Summarising, the properties of rop words are the following. 

Proposition 2. Ifuj G {2,3}* of height 2h satisfies the rhythmic oddity property 
then 

(i) the number \uj\ 2 of symbol 2 in uj is odd 

(ii) the number |a;| 3 of symbol 3 in uj is even 

(in) the length of uj is odd 

(iv) S z (uj) is a rop word for any i 

(v) uj and S 1 (uj) have a prefix of height h — 1 or h — 2. 

Some of these properties are consequences of the previous results. Since the 
height of uj satisfies 2 \u\ 2 + 3 |cj| 3 = 2 h, the number of occurrences of letter 3 is 
even. Thus, since the length of uj is odd by the previous proposition, the number 
of occurrences of letter 2 is even. 

Furthermore, Andre Bouchet defines d-pairing. Let w be a word of length n 
and d be an integer such that 0<d<n/2. A d-pairing of uj is a partition of the 
subset of indices {i : 0 < i < n, uj{ — 3} into pairs of indices {j, j + d}. Arithmetic 
operations on indices are to be understood mod n. Let uj = uj^uji ... cj n _i be 
a word of {2,3}* and d a positive integer coprime with n. Denote by uj^ = 
xqXi .. .x n -\ the word obtained by reading all letters of uj by step d , starting 
from c^o• Namely, each letter of uj^ is Xj = u with k = jd mod d, 0 < j < d. 
For instance, the word uj = 2233233 depicted on Fig. 1 below with n = 7 and 
d = 3 admits a 3-pairing and a/ 3 ) = 2333322. As a corollary of the previous 
result, A. Bouchet shows the following result, which is the theoretical meaning 
of the Hop-and-jump algorithm given by Godfried T. Toussaint in [11]. 

Theorem 2. Let uj be a word of even height, uj is a rop word if and only if the 
two conditions are satisfied: 

(i) The length of uj is odd, say 2^ + 1. 

(ii) uj admits a I-pairing . 
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Some years before, Marc Chemillier and Charlotte Truchet [7] gave another 
characterization of rop words by introducing asymmetric pairs. 

Definition 2. The words (u,v) form an asymmetric pair if no pair of prefixes 
(v!,v') of u and v respectively exist such that h(u') = h(v') + 1. 

For instance, (2233, 233) is an asymmetric pair, but (2232, 333) is not since 
the pair of prefixes (22, 3) verifies h( 22) = h( 3) + 1. 

Theorem 3. A worduo satisfies the rhythmic oddity property if and only if there 
exists an asymmetric pair (u,v) such thatuj = uv or u = vu with h(v) =h{u) + 2. 

A construction of asymmetric pairs, given by Chemillier and Truchet, uses 
two functions over the monoid A * x A* into itself, namely 

r(u,v) = (3u,3v), s(u, v) = (v, 2u) (3) 

and identifies any word u> of A * with a word a over {r, s}* by the map uj —> /(a;) 
such that f(oj) = a(s, e) where 5 is the empty word. For instance, the Cuban 
tresillo rhythm represented by the word 332 is associated with the word sr, since 
(3,32) = s( 3,3) = sr(e,£). The characterization of rop words uj is then moved 
to a property of the associated word a. 

Theorem 4. A word uj satisfies the rhythmic oddity property if and only if there 
exists a word a G {r, 5 }* with |a| s being odd, such that S n (uj) = uv or 5 n (uj) = vu 
for some n with (u,v) = a(e,e). 

3 A Bijection Between Some Necklaces and rop Words 

Let 77-2 and 77,3 be the number of symbols 2 and 3 in uj and 77 , = 77^2 + 77,3 be 
the length of uj. For a given 77 - 2 , we will use the d-pairing to show that there is 
a one to one correspondence between aperiodic necklaces of length n with 77,2 
black beads (represented by letter 2) and (tt- — 77 ^) white beads (represented by 
letter 3) and Lyndon rop words of length n' = 2n — n 2 with n' 2 = 77,2 letters 
2 and n' 3 = 2 ( 77 ,- 77 / 2 ) letters 3. And also a one-to-one correspondence between 
necklaces (eventually periodic) of length n with 77,2 black beads (represented by 
letter 2) and ( 77 ,- 77 , 2 ) white beads (represented by letter 3) and rop words. The 
correspondence is obtained by adding or removing the letters 3 coming from the 
pairing. Let us examine an example (See Fig. 1). 

Fix 77 , 2 ? fc> r instance 77,2 = 3, and let 77 , be 77 , = 5. The word 2233233 is a 
(Lyndon) rop word with odd length 7 (n' 2 = 3, 77,3 = 4) since it has a 3-pairing. 
Put the word on a circle, starting from the bottom and turn counterclockwise as 
shown on the Fig. 1. Now discard the second 3 of each pair (3, 3) turning coun¬ 
terclockwise. Reading the remaining word clockwise starting from the bottom 
gives the word 22332, one of the two necklaces of length 5 with 3 letters 2. Con¬ 
versely, starting from the word 22233, it is easy to add a 3-pairing by doubling 
each letter 3, with respect to the counterclockwise tour. 
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stall/ 

Fig. 1 . Cyclic representation of rop words 


Thus, we can always transform a (resp. Lyndon) rop word u of length 2£ +1 
by a one-to-one map such that the letters 3 in are always coupled by 
subwords 33. The bijection if sending 2 —> 0 and 33 —> 1 maps to a word 
u f G {0,1}* corresponding to a (resp. aperiodic) necklace. 


LjJ 


U) b / 

’ —> ijj 


The Table 1 shows the first Lyndon rop words for n 2 = 3 and the corresponding 
aperiodic necklaces. The first Lyndon rop word of Table 1 is a rhythm 22323, 
sometimes called fume-fume and used by Ewe people of Ghana. Conversely, 


Table 1 . Correspondence for ri 2 — 3 


Aperiodic necklaces 

n 

Lyndon rop words 

n 

0001 

4 

22323 

5 

00011 

5 

2233233 

7 

00101 

5 

2323233 

7 

000111 

6 

223332333 

9 

001011 

6 

232332333 

9 

001101 

6 

232333233 

9 
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starting from the representing Lyndon word of an aperiodic necklace a/, we 
construct the word by the bijection t / -1 sending 0 —> 2 and 1 —> 33 , and 
the word u by applying </> -1 . By construction, the height h(u^) is even and 
also h(uS). Moreover, u has a ^-pairing and then is a rop word. 


4 Enumeration of the rop Words 


The number of necklaces (see [ 1 ] for details, and also [ 6 ] for applications to music 
theory) with n 2 black beads and 77,3/2 white beads derives from the generating 
function of the action of the cyclic group 

Z(C n 2 ,x) = ±-J 2 W)x 7 /d ( 4 ) 

d\ri 2 


where the sum is over all divisors d of 77,2 and ip is the Euler totient function, 
according to the substitution of Xj by • The development gives the coeffi¬ 
cients of Xj which are precisely the number of necklaces with 77,2 black beads and 
j white beads. For example, if 77,2 = p is prime, the development leads to the 
following equations: 


Z(C P , x) = - + <p(p)x p ) 

1 1 p -1 1 


p (1 — x) p p 1 — x p 
1 ( 1 + y^p(p+l)...(p + n- 1 ) 
p 


** ~ 1 1+ n — z - " n r^^ i+^ ( e* 

fp + n- 1 \ _ , p- 1 


n= 1 


np 


p 


= i + iV 

P “ 


n— 1 
00 


x n + 


n =0 

(x p + x 2p +x 3p + ...) 


— 1 + 'y ' a n x r ‘ 

n— 1 


( 5 ) 


with 

a = { (++) if mod p , 

n \ ( p+ ^ ) + p — 1 if n = 0 mod p 

The Table 2 with 77,2 on the horizontal axis and 77,3 on the vertical axis shows the 
number of rop words for 77,2 and 77,3 fixed. The number of rop words of length 
n is given by summing along the diagonal 77,2 + 77,3 = n. Each column of the 
table is obtained from the development of the generating function Z(C n 2 ,x). 
For 77,2 prime, the coefficients agree with the formula of a n given above. In each 
column, we recover the number of binary necklaces with length 77,2 +g and density 
q = 77,3/2 given by the right hand side of the next formula. From the bijection of 
the previous section, it follows that the number #(77,2,77,3) of rop words with 77,2 
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Table 2. Number of rop words ( 712 , 713 ) 



1 

3 

5 

7 

9 

11 

13 

15 

17 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

4 

1 

2 

3 

4 

5 

6 

7 

8 

9 

6 

1 

4 

7 

12 

19 

26 

35 

46 

57 

8 

1 

5 

14 

30 

55 

91 

140 

204 

285 

10 

1 

7 

26 

66 

143 

273 

476 

776 

1197 

12 

1 

10 

42 

132 

335 

728 

1428 

2586 

4389 


symbols 2 and 713 symbols 3 is the number of binary necklaces of length 772+773/2 
with fixed density 773/2, 


R(n 2 ,2q) = 


1 


n 2 + q 


V(d) 


d | gcd (n 2 +q,q) 


( n 2 +q)/d\ 
q/d )> 


q = 1 , 2 , 3 ,... ( 7 ) 


By computing Lyndon words on alphabet {2,3} and deleting those which are 
not rop words, we get the Table 3 of the number of Lyndon rop words for 772 and 
773 fixed, with 772 on the horizontal axis and 773 on the vertical axis. The total 
number of Lyndon rop words of length n is obtained by summing along diagonal 
^2 + 773 = 77 . The differences between the Tables 2 and 3 are in italics. In each 
column of Table 3, we recover the number of fixed density Lyndon words given 
by the following formula, with 773 = 2 q. It follows from the previous section, that 
the number 1 /( 772 , 773 ) of Lyndon rop words with 772 symbols 2 and 773 symbols 
3 is 




(8) 


d | gcd(n 2 +<?,<?) 

where fi is the Mobius function. 


Table 3. Number of Lyndon rop words (772,773) 



1 

3 

5 

7 

9 

11 

13 

15 

17 

2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

4 

1 

2 

3 

4 

5 

6 

7 

8 

9 

6 

1 

3 

7 

12 

18 

26 

35 

45 

57 

8 

1 

5 

14 

30 

55 

91 

140 

204 

285 

10 

1 

7 

25 

66 

143 

273 

476 

775 

1197 

12 

1 

9 

42 

132 

333 

728 

1428 

2584 

4389 


By summing these formulas along a diagonal n = + 773 , we get the number 

C n of Lyndon rop words of length n and the number 7 Z n of rop words of length n: 
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(n—3)/2 

C n = ^ L(n 2 ,n 3 ) = ^2 L(2 p+l,n-2p-l) (9) 

n 2 +n 3 =n p =0 


and 


(n—3)/2 

U n = ^2 R{n 2 ,n 3 ) = l+ ^2 R{2p+\,n-2p-\) (10) 

n2+ri3=n p =0 


These numbers are tabulated as follows: If n is prime, the difference between the 
cardinal of the two sets is 1, since the word 2 n (where the letter 2 is repeated 
n times) is a rop word but not a Lyndon word. If n is a product or a power of 
primes, some periodic words appear that are rop words but not Lyndon words. 
This explains the differences between the set of rop words and the set of Lyndon 
rop words. For instance, if n = 9, (233) 3 is a rop word but not a Lyndon word. 
The same is true for the words (22323) 3 , (233) 5 and (23333) s of length 15. For 
77.2 = 9 and n% = 12, there are 333 Lyndon rop words and 335 rop words. The 
two non Lyndon rop words are: (2233233) s and (2323233) s . 


Table 4. Numbers of Lyndon rop words and rop words of length n 


n 

3 

5 

7 

9 

11 

13 

15 

17 

19 

21 

23 

25 

27 

TZn 

2 

3 

5 

10 

19 

41 

94 

211 

493 

1170 

2787 

6713 

16274 

C n 

1 

2 

4 

8 

18 

40 

90 

210 

492 

1164 

2786 

6710 

16264 


5 Generalized rop Words 

Let s be an integer >2. The rhythmic oddity property could be generalized over 
any alphabet of positive integers as follows. 

Definition 3. Let A be an alphabet A C {1,2, A word uo E A* is a 

generalized rop word of parameters (p,q) or a (p, q)-grop word for short, if 

(i) h(u) = 0 mod p 

(ii) No cyclic shift of cj can be factorized into q words ui, u 2 ,..., u q such that 


h(ui) = h(u 2 ) = ... = h(u q ) 

A Lyndon grop word is both a grop word and a Lyndon word. For instance, 
if A as {2,3} the first grop words of parameters (2,3) are: 2223, (3333), 
22233, 22323, (33333), 222333, 2222223, 2232333, 2233233, 2233323, 2323233, 
(3333333). Non Lyndon words are given in parenthesis. A computation of the 
number of the first (2,3)-grop words of length n is shown in Table 5. The first 
(2,4)-rop words are: 22233, 22323*, (222222), 223333, 232333, (233233), 2222233, 
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2222323, 2223223*, 2333333*. Non Lyndon words are in parenthesis. The star 
indicates (2,2)-rop words. A computation of the number of the first (2,4)-grop 
words of length n is shown in Table 6. Over the alphabet {1,2,3}, Olivier Mes¬ 
siaen uses in Cinq Rechants the indian rhythm simhavikrama 2221323. It is a 
(3, 2)-grop word, but not a (2, 2)-grop word. And for instance, the words 111 
and 333 are not (2, 3)-grop words, but 123 and 132 are. The word 11133 is not a 
(2, 3)-grop word since the subwords u\ = 111, = 3 and ^3=3 have the same 

height, but the word 11313 is a grop word. By definition, a Lyndon grop word 
is both a grop word and a Lyndon word. For instance, 11133, 11313, 11322 are 
Lyndon (2, 3)-grop words. 


Table 5. Number of (2, 3)-rop words over {2,3} 


n 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

n ( n 3) 

2 

3 

1 

6 

11 

6 

25 

46 

41 

117 

232 

278 

631 

1237 

r(3) 

f-'Tl 

1 

2 

1 

5 

9 

6 

22 

45 

40 

116 

226 

278 

620 

1236 


Table 6. Number of (2, 4)-rop words over {2,3} 


n 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 


2 

4 

4 

5 

13 

27 

47 

50 

131 

284 

479 

685 

1450 

rW 

2 

2 

4 

5 

12 

24 

47 

50 

131 

279 

473 

683 

1440 


In the following, we consider grop words of parameters (2, 2) over the alphabet 
{a, b} with a and b positive integers in {1,2,..., 9} and coprime. Most of the 
previous results can not be extended. For example, if we consider words over 
alphabet {1, 4} instead of {2, 3}, the number of rop words of a given length is 
the same: there are 3 rop words of length 5 (22222, 22323, 23333) over {2, 3} and 
3 rop words of length 5 (11444, 14144, 44444) over the alphabet {1,4}. However, 
there is no trivial bijection between the two sets. The criteria of Apairing does 
not work and similarly for asymmetric pairs. Nevertheless, we have the following 
result. 

Proposition 3. If u is a (2, 2)-grop word over the alphabet {a, b} with a and b 
positive coprime integers, a < b, then there exists a unique pair (u,v) with 

h{y) = h(u) + 2 m (11) 

such that u = uv or u = vu, for some m G {1,2,..., [b/2 \}. 

Proof The uniqueness of the factorization is trivial. To prove the existence of 
such a factorization, consider the first longest prefix of uj = uv such that h(u) > 
h(v). Denote x the last symbol of u, uj = u'xv. Since u is maximal, x + h(v) > 
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h(v') and h(u) < h(v') + x. The rhythmic oddity property implies x ± h(v) > 
h(v'). Thus, 

—x < h(u) — h(v f ) < x 

Let t be an integer such that h{u) — h(v') = t. The value of this integer depends 
on x and varies between 0 and ±(6—1). The height of v is then h(v) =t + h{v!) = 
h(u ) — x + t. Since the height h{ui) = h{u) ± h(v) = 2 h(u) +t — x is even because 
a; is a rop word, x — t is even, and equal to 2m for some m positive integer less 
that [b/2\ . 

For example, consider the alphabet {2,7}. The rop word 222 has a unique 
pair (2, 22) such that h( 22) = h( 2) ± 2. The rop word uj = 22277 has a unique 
pair v = 2227, u = 7 such that co = vu and h(v) = h(u ) ± 6. 

Proposition 4. Let uj = uv be a factorization of a word such that h(v) = 
h(u) ± 2m, for some m. There exists a pair of prefixes ( u',v ') of (u,v), namely 
u = u'u" and v = v'v" such that h(v') = h(u) ± m if and only if there exists a 
cyclic shift v"u'u"v' such that h(v"u') = h(u"v f ). 

Proof The proof is just a formal computation. 

h(v') h{u f )+m 2 h(v') = 2h(u')-\-2m h(u')—h(v')+2m = h(v')—h(u') 

h{v") - h{u") = h(v') - h{u') h(v"u') = h(u"v'). 

We end this section by a computation with Maple Software of the number of 
rop words of parameters (2, 2) over the alphabet {a, b}. If (a ± b) is odd (resp. 
even) the length of rop words is odd (resp. even). By comparing Table 4 and 
Table 7, the computation shows that the number of rop words over {2, 3} is the 
same as the number of rop words over {1, 4}. In Table 7, we compute the number 
of (2, 2)-grop words of length n over {a, b} for n and (a ± b) odd. 


Table 7. Number of (2, 2)-grop words of odd length n over {a, b} 


{a, b}\n 

3 

5 

7 

9 

11 

13 

15 

17 

19 

21 

{1,4} 

2 

3 

5 

10 

19 

41 

94 

211 

493 

1170 

{1,6} 

2 

4 

9 

23 

59 

162 

459 

1308 

3802 

11179 

{1,8} 

2 

4 

10 

29 

85 

262 

823 

2596 

8290 

26684 

{2,9} 

2 

4 

10 

30 

93 

305 

1019 

3416 

11554 

39281 

{4,9} 

2 

4 

10 

30 

94 

315 

1083 

3752 

13135 

46235 

{7,8} 

2 

4 

10 

30 

94 

316 

1095 

3841 

13663 

48990 

{8,9} 

2 

4 

10 

30 

94 

316 

1096 

3855 

13781 

49770 


And in Table 8, the same is done for n and (a ± b) even. None of these 
sequences are referenced in the database of Neil J. Sloane. 

Perspectives. The next step of this study is to show the following statement. 
The number of (2, 2)-grop word of length n over the alphabet {a, b} with a, b 
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Table 8. Number of (2, 2)-grop words of even length n over {a, b} 


{a,b}\n 

2 

4 

6 

8 

10 

12 

14 

16 

18 

20 

{1,5} 

1 

2 

5 

10 

25 

62 

157 

410 

1097 

2954 

{1,7} 

1 

2 

6 

15 

44 

128 

378 

1138 

3478 

10712 

{1,9} 

1 

2 

6 

16 

51 

162 

521 

1698 

5586 

18464 

{5,7} 

1 

2 

6 

16 

52 

171 

574 

1958 

6742 

23309 

{5,9} 

1 

2 

6 

16 

52 

172 

585 

2034 

7167 

25418 

{7,9} 

1 

2 

6 

16 

52 

172 

586 

2047 

7270 

26064 


positive coprime integers is equal to the number of (2, 2)-grop word of length n 
over {a', b'} for all n if and only if 


cl T b — o! T b (12) 

For instance, the computation shows that the numbers of (2, 2)-grop words of 
a given length over alphabet {1, 6}, {2, 5} or {3, 4} are equal. The problem is 
still open. 

Acknowledgements. The author thanks Marc Chemillier and Andre Bouchet for 
stimulating discussions and Harald Fripertinger for valuable comments and remarks. 
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Abstract. Musical relations and dependencies between events within a 
musical passage may be better explained as a graph rather than in a 
sequential framework. This article develops a multiscale structure model 
for music segments, called Polytopic Graph of Latent Relations (PGLR) 
as a way to describe nested systems of latent dependencies within the 
musical flow. The approach is presented conceptually and algorithmi¬ 
cally, together with an extensive evaluation on a large set of chord 
sequences from a corpus of pop songs. Our results illustrate the effi¬ 
ciency of the proposed model in capturing structural information within 
such data. 


1 Presentation 

It is quite common sense that listeners do not perceive music only as a mere 
sequence of sounds, nor composers conceive their works as such. Music is essen¬ 
tially the result of patterns whose inner organization and mutual relationships 
participate to the overall structure of the musical content, at different time-scales 
simultaneously. 

What is exactly music structure remains an open scientific question. This 
article is a contribution towards one particular aspect of music structure: it 
proposes and investigates a multiscale model of the inner organization of musical 
segments, which we call Polytopic Graph of Latent Relations (PGLR). 

The musical content observed at a given instant t within a music segment 
obviously tends to share privileged relationships with its immediate past, hence 
the sequential perception of the music flow. But music content at instant t also 
relates with distant events which have occurred in the longer term past, espe¬ 
cially at instants which are metrically homologous to £, in previous bars, motifs, 
phrases, etc. This is particularly evident in strongly “patterned” music, such 
as pop music, where recurrence and regularity play a central role in the design 
of cyclic musical repetitions, anticipations and surprises. But it is also discern- 
able in a number of other music genres, which rely abundantly on all sorts of 
multiscale similarities, progressions, expectations and denials. 

To overcome the limitations of purely sequential models in music content 
descriptions, hierarchical models are often resorted to, in order to provide a 
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representation framework for the grouping structure of a musical passage. The 
most famous hierarchical approach is undoubtedly the Generative Theory of 
Tonal Music (GTTM) by Lerdahl and Jackendoff [8], which has been for many 
years a source of inspiration for a wide variety of work in music structure mod¬ 
eling. However, hierarchical approaches such as GTTM rely axiomatically on an 
adjacency hypothesis, under which the grouping of elements into a higher level 
object is strictly limited to neighbouring units. 

In this work, we develop a different view as regards the structural associ¬ 
ation of elements forming music segments. We describe the “web” of musical 
elements as a Polytopic Graph of Latent Relations (PGLR) which models rela¬ 
tionships developing predominantly between homologous elements within the 
metrical grid. 

For most segments of 2 n events, the PGLR lives on an n-dimensional cube 
(square, cube, tesseract, etc.), n being the number of scales considered simulta¬ 
neously in the multiscale model. By extension, the PGLR can be generalized to 
a more or less regular n-polytope. 

Each vertex in the polytope corresponds to a low-scale musical element, each 
edge represents a relationship between two vertices and each face forms an ele¬ 
mentary system of relationships. In addition, one variant of the proposed model 
views the last vertex in each elementary system as the denied realization of a 
(virtual) expected element, itself resulting from the implication triggered by the 
combination of former relationships within the system. 

The estimation of the PGLR structure of a musical segment can be obtained 
computationally as the joint estimation of: 

1. the description of the polytope (as a more or less regular n-polytope), 

2. the nesting configuration of the graph over the polytope, reflecting the flow 
of dependencies and interactions between the elementary implication systems 
within the musical segment (this flow being assumed to be causal), 

3. the set of relations between the nodes of the graph, with potentially multiple 
possibilities which need to disambiguated (hence the “latent” nature of the 
relations, as they are not actually observed). 

In this paper, the shape of the polytope is assumed to be a tesseract (4-cube) 
and we focus our study on the modeling of meter-synchronous chord sequences of 
16 chords. However, the general framework encompassed by PGLR is potentially 
applicable to many other musical dimensions (rhythm, melody, etc.) as soon as 
relevant latent relations can be defined. 

In Sect. 2, we introduce the main concepts and formalism related to the 
model. Section 3 covers computational aspects of the approach, including opti¬ 
mality criteria and algorithmic design. In Sect. 4, we present a series of exper¬ 
imental results which assess the advantages of the PGLR model. We conclude 
with perspectives outlined by the proposed approached. 


240 C. Louboutin and F. Bimbot 

2 Concepts and Formalism 

2.1 Chord Representation and Relations 

Strictly speaking, a chord, in music, is any harmonic set of notes (or “pitches”) 
that are heard as if sounding simultaneously. However, in tonal western music, 
chords are more generally conceived as sets of pitch classes supporting the local 
harmonic groundplan of the music. In particular, chords play a strong role in the 
accompaniment of the melody in pop songs. The most frequently encountered 
chords are triads (i.e. sets of three pitch classes), with a predominance of major 
and minor triads. More sophisticated chords contain combinations of 4 pitch 
classes or even more. 

Chords can be represented in various ways. In this article, we consider two 
types of representations: (i) the complete set of pitch classes forming the chord 
(PC description) and (ii) the tabular notation of the major or minor triadic 
reduction of the chord (TR description). Assuming 4 or 5 pitch classes per chord, 
this leads to potentially several hundreds of different PC descriptions (much less 
in practice), but only 24 distinct TRs. 

A number of formalisms exists to describe chord relations, either in clas¬ 
sical musicology (through chromatic relations or via the circle of fifths) or in 
the framework of more recent theories, in particular Wietzmann regions [21] or 
neo-Riemannian theory [3]. Tymockzo [18,19] also proposes a model based on 
combinations of chromatic and scalar transpositions. 

Depending on the formalism under consideration, the property of uniqueness 
of the relation between two chords may or may not be satisfied. 


Triad Circles. We call triad circle any circular arrangement of triads aimed 
at reflecting some proximity relationship between triads along its circumference. 
The circle of thirds is formed by alternating major and minor triads with neigh¬ 
bouring triads sharing two common pitch classes. The circle is shown on Fig. 1. 
This representation provides a way to express the relationship between two TRs 
- in a unique way - as the angular displacement around the circle. Alternatively, 
the chromatic circle is arranged according to a chromatic progression (not rep¬ 
resented on Fig. 1). 


Optimal Transport. If two chords X and Y are represented as a set of pitch 
classes X{ and the set of transports between X and Y can be defined as: 

T={t k = (x ik ,y jk ) | x ik e X , y jk G Y} (1) 

that is, pairs of notes across the two chords indexed by an integer k which 
represents a virtual mapping between their respective pitch classes. This is a 
simplified model that can be used to represent “voices” in chord sequences. We 
consider complete transports, i.e. each note is associated to at least one voice. 
Examples of transports are given on Fig. 2. 
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Fig. 1. Triads: circle of thirds. 


Fig. 2. Two possible transports 
between C and Fm. 


The cost of a transport is defined as the sum of the costs associated with 
each pair of notes in the transport: \T\ = y)eT \d{x, y) |. In this paper we use 
two types of distances: 

- the chromatic distance (or smoothness) [3,9,16], which is the shortest dis¬ 
placement in semitones from pitch class x to pitch class y. In Fig. 2 the first 
transport is minimal for the chromatic distance (cost equal to 2 ). 

- the harmonic distance , where the displacement is considered on the circle of 
fifths instead of the chromatic scale. In Fig. 2, the second transport is minimal 
for the harmonic distance (cost equal to 6 ). 


2.2 Systemic Organization 

Based on the hypothesis that the relations between musical elements in a seg¬ 
ment are not necessarily sequential, the System & Contrast (S&C) model has 
been recently formalized [1] as a generalization and an extension of Narmour’s 
Implication-Realization model [14]. Its applicability to various music genres for 
multidimensional and multiscale music analysis has been explored in [4] and algo¬ 
rithmically implemented in an early version as “Minimal Transport Graphs” [10]. 

The S&C model primarily assumes that relations between 4 elements in a 
musical segment xq x\ X 2 xs can be viewed as relying on a matrix-based system 
of relations in reference to the first element xo (the primer ), which thus plays 
the role of a common antecedent to all other elements in the system. This is the 
basic principle that enables the joint modeling of two timescales simultaneously. 

Moreover, in the S&C approach, it is further assumed that latent relation¬ 
ships x\ = f(x o) and X 2 = g(x o) trigger a process of implication: 

%o f(x o) g{x o) ' irr ^ s g(f(x 0 )) = x 3 

Virtual element xs may be more or less strongly denied by a contrast: x^ 7 ^ £ 3 , 
which creates a potential closure to the segment. 
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Table 1 . Antecedent function for the various models. 


Sequential Systemic System & Contrast 


4>Seq{Xi) ~ Xi-i <j)Sys{Xi) = Xq <j>S&tc{Xi) 



xo if i = 1,2 

g(f(x o)) if i = 3 


As depicted in Table 1, sequential, systemic and S&C models studied in this 
article are all first-order models which assume different antecedent functions, 
between the elements forming a musical segment. It is worth noting that the 
antecedent function summarizes the entire history of Xi into a single element. 

2.3 Polytopic Representation and Nested Configurations 

Polytopic Representation. Elementary systems of 4 elements, as described 
in the previous section, can further be used to describe longer sequences of 
musical events. In particular, sequences of 2 n elements can be arranged as an 
n-dimensional cube, within which each face potentially forms a S&C at time 
instants that share specific relationships in the metrical grid. 

For instance, a sequence of 16 chords can be divided into four sequences of 
four successive chords, each of them being described as separate systems. Then, 
these four S&Cs, taken as elementary objects, can be related by forming an 
upper-scale S&C, linking the four primers of the 4 lower-scale S&Cs. Figure 3 
represents such a description projected on a tesseract, in the case of the chord 
sequence from the chorus section of Master Blaster by Stevie Wonder: 


Cm Cm Cm Bb Ab Ab Ab Gm F F F F Cm Cm Bb Bb 


System Nesting. However, depending on the sequence, other arrangements of 
the systems may prevail. If we now consider the following example: 

Bm Bm A A G Em Bm Bm Bm Bm A A G Em Bm Bm 

a different configuration appears to be more efficient to explain the sequence 
with a multiscale model. In fact, grouping chords [0, 1, 8, 9], [2, 3, 10, 11], [4, 
5, 12, 13] and [6, 7, 14, 15], and then relating these four faces of the polytope 
by an upper-scale system [0, 2, 4, 6] leads to a less complex (and therefore more 
economical) description of the relations between the data within the systems. 
This nesting configuration is called P* in the rest of the paper and is distinct 
from the configuration considered in the first example, Po, where the upper- 
scale system [0, 4, 8, 12] links four lower-scale nested systems [4 k + j]o<j <4 for 
0 < k < 4. Figure 5 illustrates these two configurations. 

Therefore, multiscale polytopic descriptions involve different possible flows 
of dependencies and interactions between systems, which correspond to distinct 
nesting configurations. A nesting configuration is characterized by its correspond¬ 
ing antecedent function (as defined in Sect. 2.2). We furthermore assume that 
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4 5 



Fig. 3. Polytopic representation of the chord 
sequence taken from Master Blaster by Ste¬ 
vie Wonder. 



Fig. 4. Tesseract where elements of 
the same depth are aligned vertically. 



Fig. 5. Representations of the relations used by a multiscale analysis of a sequence of 
16 events projected on a tesseract: Po (left), P* (right). 


nesting configurations must respect a causality principle: that is, the antecedent 
of any element in a system must have been observed before that element. This 
leads to a partial order between elements in the tesseract, as depicted on Fig. 4. 


Static Configurations. Among all possible ways to construct nested configu¬ 
rations, an interesting subset consists in nesting faces of the polytope such that 
all vertices are used once and only once. In that case, valid nesting configurations 
consist in specific permutations of the initial index sequence. As, for each cube in 
the tesseract, there are three possible pairs of square systems corresponding to 
parallel faces of the cube, there is a total of 4 * 3 * 3 = 36 possible permutations 
such that each lower scale system contains only causal flows. 

Among these 36 possibilities, 6 are dual solutions. For 6 others, which we 
call Primer Preserving Permutations (PPPs), the system formed by the primers 
of each lower-scale system is itself a face in the polytope. PPPs preserve the 
role of elements with index 2 P as being primers of one of the system in the 
configuration. Whereas the list of PPPs is easy to tabulate for a tesseract, a 
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recursive algorithm can be used for larger values of n. Note that Pq and P* are 
both PPPs (see Fig. 5). 

All configurations referred to in this section are made of four non-adjacent 
faces on the polytope, whose primers are related by a fifth upper-scale system. 
In the case of PPPs, the upper-scale system is itself a face in the polytope. 

Dynamic Nesting. Another way to define a nesting configuration is to con¬ 
struct it on-the-fly , by determining successively for each element placed in con¬ 
trastive position, which of the possible implication systems it is more advan¬ 
tageous to relate it to. In this case, the cost function is used for each system 
hypothesis, to select the optimal one and disambiguate the antecedent func¬ 
tion when several options are possible. Looking at Fig. 4, it appears that nodes 
7,11,13,14 are contrastive in three different implication systems and 15 in 6 
implication systems. Therefore, there exists 3 4 * 6 = 486 distinct dynamic nest¬ 
ing configurations. 

3 Optimization and Algorithmical Aspects 

3.1 A Minimum Description Length Criterion 

Given a sequence X = xq . .. xi- i, the estimation process of the best PGLR, S x , 
requires the definition of an optimality criterion embedding all the variables: 

s x = argminp t G,R F{P, G, R\X)) (2) 

where P, G and R respectively denote the description of the polytope, the graph 
and the latent relations for sequence X. 

Assuming that T is measuring the complexity of the sequence structure, S x 
can be defined as the shortest description of the sequence. Therefore, searching 
for S x can be seen as a Minimum Description Length (MDL) problem [20] and 
T can be understood as a function that evaluates the size of the “shortest” pro¬ 
gram needed to reconstruct the data [6]. This is strongly related to the concept 
of Kolmogorov complexity, which has received increasing interest in the music 
computing community over the past years [11-13,17]. 

The exact computation of S x cannot be achieved and it is approximated in 
the following way: 

- the description cost of P can be estimated as a function of the regularity of 
the polytope. In this work it is discarded because all polytopes are tesseracts. 

- the description cost of G can be assumed to be constant for all configurations 
within a model class. It is related to the number of distinct possible graphs 
(DPG) in the PGLR. 

- the cost (J-r) of the relations associated with a given nested configuration: 

/-i 

R x = argminp {R r (R\G,X)} with F r (R\G, X) = ^ \r(& G (xi), %i)\ (3) 

i=1 

where is the antecedent function associated to G and \r(x,y)\ is the cost 
of the relation between x and y. 


www.ebook3000.com 


Polytopic Graph of Latent Relations 245 


3.2 Optimization Process 

Given that the cost of P and G are assumed to be constant, the aim of the 
optimisation process is to estimate the set of latent relations. 

In the case of TRs, the process is rather straightforward: a relation between 
two chords in a triad circle is unique. 

Conversely, optimal transport provides multiple possibilities of connecting 
chords together. The exhaustive optimization over the whole sequence would 
require to consider all combinations of transports. However, to make the com¬ 
putation tractable, the process is divided in several simpler sub-problems as 
follows. 

For the sequential model, the chord sequence is processed as groups of 4- 
chord progressions (fusing beforehand identical neighboring chords, for which the 
transport is trivially determined). Then the last chord of each group is related 
to the first chord of the next group by minimal transport. 

For the static systemic models, each elementary problem corresponds to a 
square system to optimize. Upper-scale systems are optimized first and then each 
lower-scale system is estimated independently. This process is repeated for each 
possible configuration. Details can be found in [10], with two adjustments which 
do not significantly impact the performance but save a lot of computation load: 
(i) for square systems, the contrast relation is optimized aside from the other 
systemic relations, (ii) the set of static configurations is restricted to PPPs. 

For the dynamic nested S&C model, each chord is considered successively 
in a chronological order. Those which are directly related to the primer (nodes 
1, 2, 4 and 8) enable the estimation of the corresponding latent relation. Those 
who are in a single contrastive position (nodes 3, 5, 6,9,10,12) are used to com¬ 
plete the estimation of the corresponding systems. Some chords in contrastive 
position belong to several systems (nodes 7,11,13,14 to 3 systems and node 
15 to 6 systems): in these cases, the system with minimal cost is chosen. The 
whole process therefore results in a graph which has been built dynamically by 
successive optimisations of square systems. 

4 Experimental Validation 

4.1 Methodology 

Experimental Setups. To assess the ability of the PGLR model to capture 
structural information in chord sequences, we have carried out a set of experi¬ 
ments on a corpus of 727 x 16 beat-synchronous chord sequences from the RWC 
POP dataset [5]. 

These experiments aim at evaluating the relevance of the PGLR model and 
at comparing different chord representations, types of models and optimization 
schemes. 

The two types of chord representations presented in Sect. 2.1 (PCs and TRs) 
are considered in conjunction with optimal transport (for PCs and TRs) and 
triad circle relations (for TRs only). We compare the sequential bi-gram model 
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(Seq) - a very common approach in MIR [15] - with different types of systemic 
models (Sys and S&C) as defined in terms of their antecedent functions in 
Tablet, as well as the dynamic approach ( Dyn ). 

For the systemic models, three types of system optimization are considered: 

- So which corresponds to the static configuration P 0 (see Fig. 5, left); 

- S* which corresponds to the globally optimal PPP over the whole corpus 
which happens to be P* (see Fig. 5, right); 

- in this case, the optimal PPP is chosen a posteriori as the one that 
optimizes the description of X, which varies across all Xs. 

Perplexity. As there exists no ground truth as of the actual structure of a 
chord sequence, we compare the different models as regards their prediction 
ability. This is done by calculating for each model the perplexity [2], P, derived 
from the negative log likelihood (NLL), H. The aim is to measure how well an 
unseen sequence, X = xo ... xi- 1, can be predicted by the model: 

1 l ~ X 

i?(i) = - 7 yio g p(x 1 |<f(x j )) (4) 

i =0 

with the convention c/)(xq) = xo and P{pc o|^o) = P(#o)- 

For the triad circle relations, P(y \x) is estimated as the relative frequency of 
r(x, y) (and P(x o) is set to 1/24). Similarly, for a pitch class distance d, P(y\x) is 
also estimated as the frequency of d(x, y) (and here, P{pc o) = 1/12). The learning 
phase for r and d is done using a 2-fold cross-validation strategy: probabilities 
are estimated on one half of the corpus (even numbered songs) and used on the 
other half (odd numbered songs) to compute H and vice-versa. 

For optimal transport, X is viewed as a set of simultaneous “voices”, X k , 
and we compute H as the average voice NLL: 

H{X)= l -Y, H X) (5) 

k 

where each term H(X k ) can be computed horizontally, using Eq. 4. 

Ultimately, the performance is reported in terms of perplexity , P, which can 
be understood as an estimation of the average branching factor in predicting the 
sequence knowing its PGLR structure: 

B(X) = 2 h ^ x) (6) 

Note that, whereas PGLR is fundamentally optimized on the basis of a com¬ 
plexity criterion, its impact is evaluated in a probabilistic framework, so as to 
measure its capacity to compress the data information in a meaningful way. 

4.2 Results 

Table 2 summarizes the perplexity figures obtained for a variety of experimental 
setups, from which a number of observations can be made. 
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Table 2. Average perplexity obtained with 2-fold cross-validation for the different 
models on RWC POP. DPG stands for Distinct Possible Graphs. 



Triad circle 

rotation on 
TR 

Optimal transport 

DPG 

Chromatic 
on PC 

Chromatic 
on TR 

Harmonic 
on PC 

Harmonic 
on TR 

Seq 

8.00 

3.32 

3.58 

4.11 

4.50 

1 

Sys 0 

8.88 

3.43 

3.68 

4.32 

4.72 

1 

Sys* 

7.62 

3.12 

3.11 

3.86 

4.23 

1 

Sys x 

5.78 

2.66 

2.73 

3.18 

3.41 

6 

S&Co 

6.68 

2.97 

3.16 

3.92 

4.06 

1 

S&C * 

5.35 

2.60 

2.71 

3.39 

3.56 

1 

s&c x 

4.63 

2.39 

2.48 

2.99 

3.12 

6 

Dyn x 

4.82 

2.55 

2.44 

4.29 

4.32 

486 


Benefit of Systemic Organizations. Systemic models globally outperform 
the sequential one 1 : all perplexity values are lower, except for the basic Syso 
configuration. In particular, the S&C X model provides the most spectacular 
perplexity improvement for all types of chord representations and relations (at 
the expense of a very limited number of DPGs). Note that the P* configuration 
provides a noticeable advantage over Seq and Po configurations. The last row 
of the table also shows that the dynamic nesting approach is an interesting 
alternative as it provides perplexity scores almost as favorable as S&C X . 


Predictive Support of the Virtual Element. The effectiveness of the vir¬ 
tual element in the S&C scheme is underlined by the systematic improvement 
observed when shifting from Sys to S&C results. The virtual element, X 3 , in 
S&C appears globally as a better antecedent for xs than does the primer, xo, in 
Sys. However, for about one third of test sequences Sys x outperforms S&C X 
(figure not reported in Table 2), in particular for aaba structures. 


Triad Circle Relations vs. Optimal Transport. The performance of tri¬ 
adic circle relations (TCRs) is based on a global sequence entropy while the 
optimal transport (OT) approach is evaluated in terms of average “per voice” 
entropy. In particular, the maximal branching factor of TCRs is 24 instead of 
12 for OT. Therefore, the two perplexity scores cannot be compared. However, 
both approaches show similar trends w.r.t. the relative model performance. This 
supports the hypothesis of a general benefit of the multiscale approach rather 
independently from the way the chord information and relations are encoded. 


1 This confirms preliminary results formerly obtained on a much smaller corpus of 45 
chord sequences [10]. 
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In Table 2, results are also provided for optimal transport on triadic reduc¬ 
tions (TRs) treated as PC description. Here too, the relative performance levels 
across models show the same trends. Note that the perplexity on TRs is slightly 
higher because the average pitch class distance between triads tends to be larger 
than that between chords with 4 notes or more. 


Harmonic vs. Chromatic Transport. In chromatic optimal transport, the 
distance is computed from the set of note displacements measured on a semitone 
scale. We also tested a harmonic distance by considering displacements on the 
circle of fifths. Results in Table 2 show that this globally degrades the perfor¬ 
mance. Conversely, there is no need to consider triad rotations on a chromatic 
circle, as this is formally equivalent to modeling systems on the circle of thirds. 

5 Conclusions 

Both from the conceptual and experimental viewpoints, the PGLR approach 
appears as an efficient way to model multiscale relations in music segments. It 
is expected to provide a useful framework for a number of tasks in automatic 
music processing, as well as offering an interesting tool for music analysis. 

Given that its core principles are not specific to a particular type of musical 
information, the application of PGLR to other types of musical objects, such as 
melodic motives and rhythmic patterns is a rather natural extension, currently 
under investigation. Ongoing work also includes the extension of the PGLR 
model to a larger range of timescales (n-cubes) and to chord patterns of other 
lengths (using irregular polytopes, by truncating or duplicating vertices, edges 
or faces, as in [7]). 
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Abstract. Music information retrieval techniques are used to automat¬ 
ically extract structural data of a piece, however there have been few 
attempts to study ways to automatically identify the musical form of 
digital files. In this work we present an implementation of the dynamic 
time warping algorithm for the automatic identification of musical form 
structure by means of a segmentation matrix in which we group elements 
according to maximal similarity. The system was implemented in symbolic 
files parsed with the music21 library. We tested it in two pieces: Bagatelle 
No. 25 in A minor by L.V. Beethoven, and Piano Sonata No. 11 in A major, 
K331, movement 3 by W.A. Mozart. The system obtained a correct iden¬ 
tification of the similar sections, both with a rondo form. We foresee that 
this algorithm can be extended to measure harmonic similarity and with 
this be able to analyze more complex forms, like a sonata. 


Keywords: Musical form • Dynamic time warping 
Music information retrieval 


1 Introduction 

Music Information Retrieval uses different algorithms and techniques to extract 
structural content from musical files. There are mainly two approaches to this, 
on one hand, we can study an audio signal, like in [9,10], on the other, we 
can take a symbolical musical file, like MIDI or MusicXML, and apply pattern 
recognition techniques to extract harmonic or melodic content, this approach is 
taken in [3,5]. 

Many of the Music Information Retrieval problems addressed revolve around 
developing genre recommendation systems [7], style imitation with artificial intel¬ 
ligence and machine learning [4], chord recognition and harmonic extraction [12], 
instrument recognition from an audio file [14]. However there have been few 
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attempts to analyze and extract musical form, among these [8], in which they 
find internal similarity within an audio file to find sections that relate among 
themselves and can be used as transitions between them. In [1], they analyze 
internal similarity of an audio file to create a general thumbnail, that is the 
minimal amount of music that represents the whole piece, they use this in order 
to simplify the search in extensive databases. 

In symbolical music files analysis, there is a need to apply these kind of 
similarity analysis to create a system that classifies internal similarity of a musi¬ 
cal piece in order to find its musical form. We argue that a system that can 
extract and identify the formal structure of symbolical music files can be useful 
to simplify other tasks, like automatic harmonic and chord labeling, removing 
redundancies in the calculation of repeating or similar sections. It can also be a 
useful pedagogical tool that can be used in music education contexts to help the 
student to better understand the concept of form. 

In this work, we propose to apply the time series technique Dynamic Time 
Warping to a symbolical musical file. This algorithm is normally used in audio 
signals and it’s useful to calculate a measure of how different two signals are by 
finding the cost of transforming one into another by means of time stretching. 
The novelty of this work is that instead of working with audio files, we will 
use the much smaller symbolical representation of music. In order to classify 
the musical form, we run through all possible sub segments of the piece and 
compare them in a similarity matrix, similar to the one used in [1], we then 
group segments with maximal similarity and label them as a new section. The 
process is repeated until all the maximal similarities have been found. In the 
present work we apply the algorithm to find the repeating section of a rondo 
form. 

2 Dynamic Time Warping 

The DTW algorithm uses dynamic programming to find the optimum alignment 
between two time series. It does this by calculating the cost to align each point of 
the first to the second. Afterwards, it takes the minimal path of change needed 
to transform each point to the other. It’s very useful in cases where we need 
to compare sequences that are time stretched or transposed. Because of this 
flexibility we consider that its application to a musical context would bring 
optimal results. 

The following is a brief review of the DTW as presented by Muller in [11]. 
If we have sequences X := (aq, ^ 2 ,..., xn) and Y := ( 2 / 1 , 2 / 2 ? -•-? 2/iv) ? the warping 
path p = (pi,... ,pl) is defined as the assignment of x n i to y m i. The path must 
satisfy the boundary condition that it always aligns the first elements, and the 
last elements of the sequences, respectively; it must be monotonic; and only move 
by unitary steps; also all elements from X and Y must be paired and there can 
be no repetitions. Muller also defines the total cost as 


L 



(i) 


1=1 
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where c is the local cost, in this case we will use the euclidean distance as a 
measure of difference. To find the optimal cost we search for the path with the 
minimum value. 

DTW(X , Y) = min {c p (X, Y) \p is a transformation path} (2) 

With this we find the minimum cost of the different possible paths of trans¬ 
forming X into Y. This measure will be useful to compare similarity segments 
within the piece. The system developed in this work uses the Fast DTW [13], 
which is based on dynamic programming concepts. The DTW measure can be 
used for query systems by comparing one small segment to different windows of 
a larger segment, this methodology is used in [6]. In order to apply this to our 
symbolic musical file, we shall use the sequence of notes given by a MusicXML 
file and treat them as a time series. 

3 Segmentation Matrix 

The proposed system was implemented in Python with the library Music21 [2], 
which is optimal for the parsing and processing of symbolical musical files like 
MIDI and MusicXML. In order to apply the DTW algorithm we need to prepare 
the data so it takes the form of a time series. We use the flat functionality in the 
music21 library to convert the xml file to a linear representation of notes and 
time offsets. Next, we translate the musical information into a time series of all 
the notes, thus having a sequence of order pairs giving the time stamp and the 
pitch of a particular event. 


P = {{timei, pitchi)} (3) 

Where 0 < i < TV, and N is the total number of notes in the piece to analyze; 
we will call this: the events list. 

As a next step we will create a list of all the possible sub segmentations of the 
events list, that is all the possible subsets of P in which all consecutive elements 
from i to j are present. 

Segs = {U{i,j ) C P\ if i<k<j,^ (: t k ,p k ) E U(i,j )} (4) 

We will use the notation Segsij to indicate the segment time consecutive 
notes from element i to j in the events list. We have N 2 different segments, and 
we need to group them according to their similarity. In order to do this, we 
will create a similarity matrix in which each entry (p, q) will have the similarity 
measure assigned between Segment p and Segment q , that is 

KJ e m NxN 


a p ,q = DTW(Seg p , Seg q ) 


(5) 

(6) 
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In this matrix we calculate the DTW similarity measure between all segments 
and thus can be used to obtain a measure to classify and group similar sections 
of the piece. The method for computing the DTW similarity value is exactly the 
same used for acoustic time series as shown by Muller [11], which uses dynamic 
programming to find the cost of transforming one series to the other, as was 
explained in the previous section. 

4 Musical Tests 

For the purpose of testing the system we choose two pieces that have a rondo 
form, in order to see if the algorithm correctly predicts the repeating pattern of 
the analysis. 

In the Bagatelle in A minor, Fur Elise, from Beethoven we have a rondo 
structure in which there is a section A at the beginning, and the musical form 
as a whole isA-B-A-C-A. When we apply the DTW measures to this 
piece we obtain a clear indication of a repeating section. In this work we show 
an example of use of the algorithm to find the repeating section of the rondo 
piece, that is, the A part. In Fig. la, we have a plot of time offset to the cost of 
the current test window. The graph takes an initial window of size 30, and runs 
through the rest of the piece in steps of 10 notes, obtaining the cost of comparing 
the test window to each of the other segments, the initial window size and step 
length were chosen empirically. The program then increases the window size to 
check if a larger frame would give a better matching result for some sections. 
The windows change in size from 30 to 100 in steps of 10, Fig. 1 show the costs 
of different windows sizes. Given the local minimums that appear around offsets 
20, 55 and 130 we see that there appears a clear similarity between these sections 
and the test window. Given that the test window was the initial part of the piece 
and repeats several times, we will call it section A, that in fact corresponds to 
the repeating section of the rondo. To analyze if the parts that differ from A 
are in fact sections B and C, we would need to apply the algorithm starting in 
the first offset not classified as section A, and search for similarities, this will 
be addressed in future work, but currently the algorithm shows a good result in 
classifying one section at a time. 

In Mozart’s Piano Sonata No. 11 in A major, K331, movement 3, also known 
as Rondo Alla Turca, we also have a similar form like in Beethoven, and if we 
apply the DTW algorithm to the piece we get the cost graph shown in Fig. lb. 
We can see that the beginning section of the piece repeats itself at time offsets 
around 40, 140, and 160. In 140 we have an almost identical similarity regardless 
of window size, while in the other cases we have some minor variation in the 
structure, as shown by the higher cost obtained. We will call this repeating 
section A. Again, to find the rest of the sections we would need to apply the 
algorithm to the rest of the piece that was not labeled as A, this analysis will 
be applied in future work. 
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Beethoven Fur Elise, window size [30,100] in steps of 10 
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Mozart Rondo Alla Turca, window size [30,100] in steps of 10 
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Fig. 1 . Cost analysis of an initial test window in: (a) Fur Elise and, (b) Rondo Alla 
Turca. The lines show the costs of different window sizes. The bottom line has a window 
size 30, while the top is 100. The middle lines increase in steps of 10 from the initial 
window size. 




5 Conclusions 

The use of DTW to compare internal sections of a musical piece gives us enough 
flexibility to compare sections of a work by allowing us to compare all the possible 
combinations of windows sizes, we use this to find section repetitions, variations 
and changes. The current work presents a way to identify a repeating section, 
and it will be further developed to find and classify all repeating sections. One 
of the main challenges to solve is that we don’t have yet a clear maximum value 
for the window size, so we have to decide what is the cost tolerance of a section 
and assign a degree of belonging of each part of the score, to the test window. 
In future work, this will lead to establish some fuzzy belonging functions to 
help us decide how we should segment the similarity sections given by the DTW 
algorithm. The current implementation only takes into account the height and 
time stamp of the note, however, in future work we could implement other kind 
of metrics to include more parameters, like harmonic distances, which can help to 
further classify similar parts and sections of a symbolical music file using features 
others than just the pitch of the notes. Another thing to consider is the range of 
the costs values, in this version, the cost is given directly by the DTW, but we 
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can see that it can range from hundreds to thousands, it would be a real benefit 
to find an optimal normalization of values, so we can have a clearer estimation 
of what the cost is indicating. In conclusion, Dynamic Time Warping gives us 
a good estimation of how to find internal similarity on a musical work to find 
its structural form, we foresee that this kind of approaches, complemented with 
other similarity measures will contribute to future work in Music Information 
Retrieval research. 
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Abstract. The problem of identifying musical styles using mathematical tools 
is central not only in musicology and the mathematical theory of music, but also 
in applications to music pattern recognition and automated music generation in a 
particular idiom. In this paper we propose a methodology related to the tran¬ 
sition network approach developed by D. Cope in his Experiments on Musical 
Intelligence, EMI. This extension allows for the possibility of defining stylistic 
cells at different scales as motifs and moduli of networks at the corresponding 
scale. It can be applied to study recursivity aspects of music. We also outline 
how this methodology can be used to systematically study stylistic changes in 
different contexts by incorporating probabilistic and statistical tools and con¬ 
nections with other approaches. 

Misura cio che e misurabile, e rendi misurabile cio che non lo e. 

(Measure what is measurable, and make measurable what is not). 

Attributed to Galileo Galilei. 


1 Introduction 

One of the most interesting problems in musicology and the history of music is the 
identification of a particular musical style. Developing a systematic methodology to 
identify and classify stylistic trends has deep theoretical implications not only in 
analysis, composition and musicology, but is also relevant in specific applications such 
as automated music generation and authorship validation. The problem is not new and 
has been addressed in different ways (see Sect. 3.4.2 on stylistic classification in 
(Nierhaus 2009) for a historical perspective and (Hardooon et al. 2014) for a recent 
approach using machine learning and information theoretical tools). In the context of 
automated music generation in a particular idiom much has been done, particularly 
using Markov chains. We refer to (Collins and Laney 2017) and the bibliography 
therein for an up-to-date account of the problem, in particular for new approaches 
dealing with the non-Markovian nature of music. In particular David Cope used 
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transition networks in his Experiments in Musical Intelligence, EMI, to generate music 
in the style of a particular composer (e.g. The Well Programmed Clavier or Virtual 
Bach). In this methodology a formal grammar with the incorporation of probabilistic 
elements is applied in order to compose stylistically admissible pieces. As Cope points 
out, recombination plays an important role in the emulation and composition of music 
in a particular style (Cope 2004). In EMI, Cope proposes a way in which basic 
elements can be recombined by means of transition networks and a careful selection of 
the units and patterns to be used. Similar approaches have been implemented incor¬ 
porating also probabilistic elements (see also (Loy 2006)). The basic idea is that 
transitions between different elements, for instance notes, chords or rhythmic values are 
studied and their probabilities computed based on a music fragment or piece(s). In a 
more technical language, a transition matrix is thus obtained and a corresponding 
Markov chain associated. The states of this chain correspond to the elements being 
considered. In the case of chords a matrix element would provide the probability of a 
specific chord, say the dominant, being followed by another, the tonic. In this way 
concatenations of melodic and rhythmic motifs or harmonic progressions can be 
compared in terms of their likelihood. This already provides a first element to under¬ 
stand some stylistic features. For instance, in baroque or classical music melodic 
transitions corresponding to stepwise motion would be more likely than others, 
whereas in serial music transitions among wider intervals will have a not negligible 
probability (see the example in the next section). In (Farber 2001) a very interesting 
study of Palestrina’s counterpoint is made. The authors incorporate stylistic rules in a 
probabilistic model for the generation of cantus firmi in the style of Palestrina, 
encoding strict rules as forbidden transitions, i.e. transitions with probability zero. They 
also mention that the inverse process of inferring the rules from computed probabilities 
is possible. 

In recent years, several classification tools have been applied to music. The use of 
machine learning techniques is promising and genre classification constitutes a very 
active area of research (Bassiou et al. 2015). Although a little surprising, similar 
methods have been less used in musicological analysis. 

In the present work we also use transition matrices, and take them as the starting 
point to construct graphs. Our goal is to show that important structural and stylistic 
features of a piece of music can be defined and inferred from these graphs. 

We outline now how this is done. In a forthcoming paper, this ideas are system¬ 
atically applied to study the stylistic differences in the keyboard music attributed to 
Charles and Louis Couperin (Knights et al. 2017a), but for the sake of completeness 
include an example here. Take for instance, the transition matrix associated to a 
melodic line; rows and columns will represent notes, so an element in the E row and the 
G column in this 12 by 12 matrix is simply the probability in the given melody that E 
will be followed by a G. We can now construct a graph in which the nodes are the notes 
and the edges are arrows with a weight (the weight being the transition probability just 
described). There are several advantages in using this graph theoretical approach. Once 
a graph for the melody has been obtained, the topological and connectivity properties 
of this graph contain important information on the melodic features and stylistic 
analysis and comparisons can be systematically made using them. To mention a 
specific example in the melodic case, moduli or communities in the graph might be 
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associated with different tonal regions or the detection of small repeated patterns in the 
graph provides also a way of extracting information about the motivic structure of the 
piece. 

An element that can also be incorporated in these models is the hierarchical nature 
of music. In (Tidhar 2005) a systematic approach using music grammars is presented to 
study the unmeasured preludes attributed to Louis Couperin (Tidhar 2005). In these 
works a hierarchical structure is determined by the slurs in the score, but in general, and 
given a music fragment, it is not straightforward to establish the different levels of 
organization. This aspect plays an essential role not only in style recognition, but it is 
fundamental also in the way composers, listeners and performers perceive music. It can 
be said that one of the more elusive aspects of music is precisely the interplay between 
several structural levels and its recursive character. Using a graph theoretical approach 
there is a simple way of identifying hierarchical structural features by studying the 
topological properties of the constructed graphs at different scales. In other words, we 
suggest a procedure for identifying the basic elements at different scales. In a more 
standard musical terminology and again referring to the melodic case, we will detect 
typical motifs at a first hierarchy, then phrases at a higher level and so on. These 
elements will be the motifs and modules (in a graph theoretical sense) of a graph 
associated with a particular fragment of music. The possibility of studying different 
hierarchical levels in this way could also be compared with other musicological 
approaches such as the structural Schenkerian analysis. 

As we remarked above, there is a natural and standard way to associate either a 
weighted directed graph or just a directed graph. A weighted directed graph can be 
obtained by considering the transition matrix as representing the weights of the adja¬ 
cency matrix of a graph (see the next section for further details). A directed graph (with 
no weights) can also be constructed by keeping only the edges whose weight is bigger 
than a certain threshold. We stress again that standard graph theory methods can be 
employed in order to determine the moduli (or communities) of the graph 1 and other 
properties. One such modulus constitutes what one might call a structural cell or motif 2 . 
This might seem technical, but in the next section we illustrate this methodology by 
means of an example comparing two contrasting musical fragments, one from several 
cantus firmi and another from a serial melody by A. Schoenberg. Before doing so, and 
summarizing our approach: we encapsulate the style of a piece by extracting infor¬ 
mation on the probability of the choices that the composer makes, consciously or 
unconsciously, in different musical aspects (melodic, harmonic, rhythmic, textural or 
timbrical) at different hierarchical levels. 

Although the theoretical aspects are very interesting, it must be said that an 
essential motivation for us were several concrete musicological questions. These 
involve authorship attribution and are dealt with in different papers (Knights et al. 
2017a, b) 


1 Methods based on random walks provide a way of generating music fragments (see for instance 
(Collins and Laney 2017)). 

2 Not to be confused with the motifs introduced by U. Alon in the context of systems biology, which 
will also be used. In order to avoid confusion we employ the term structural cells for moduli of 
graphs. 
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The rest of the paper is organized as follows. In the next section a simple melodic 
example is presented in order to introduce the methodology. We do this by comparing 
the results in two contrasting music fragments. In the final section we describe a few 
applications to the Allemandes attributed to Louis Couperin. Some musicological and 
historical implications are discussed and further research is also considered. 


2 Hierarchical Moduli of Networks 

We now proceed to describe the methodology outlined in the introduction. For sim¬ 
plicity we restrict the detailed analysis to melodic aspects and comment on how the 
methodology can be applied to take into account rhythmic, harmonic or other structural 
considerations. Moreover, a clear understanding of how to combine these elements, 
either in automated composition or analysis is still missing and one of the fundamental 
questions in the area. The first melodic lines are five cantus firmi in Dorian mode (see 
Fig. 1). In order to have sufficient melodic material we concatenate these fragments and 
consider them as one, avoiding the repetition of the final and initial D’s between 
segments. 


i 

i 


o 


o 


o 


o 


o 


o 


o 



o 


Fig. 1 . Cantus firmi in Dorian as in (Jeppesen 1992) 


The second melody is taken from The hook of Hanging Gardens, op. 15 by A. 
Schoenberg (Fig. 2). 


PPP _ 9 _ 10 hi. ._ 




Fig. 2. A. Schoenberg’s Book of the Hanging Gardens, measures 8-16. From (Domek 1979) 
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We notice how some basic statistical quantities already contain useful information 
about these fragments. In Figs. 3 and 4, a histogram of the pitch classes for each of the 
fragments is shown. More precisely, the percentage of the pitch classes appearing in the 
examples. 


Pitch class distribution for Dorian cantus firmi 



Fig. 3. Histograms of the pitch classes in the Dorian cantus firmi 


Notice that the distribution of the notes is much more uniform in Schoenberg’s 
example, as it is expected in a serial piece, with no functional tonal center, as opposed 
to the cantus , in which the most frequent pitch is precisely D, the tonal center. In 
particular, in the Dorian cantus firmi no chromatic notes appear in the histogram (the 
only accidental in the histogram is B flat, or rather its enharmonic A#, which belongs to 
the mode). It is clear that the histogram provides us with a basic and simplified 
summary of the melodic material used. This may already help in establishing some 
genre or author differences, even in fragments not as contrasting as the ones discussed 
so far (see the discussion at the end of this paper and (Knights et al. 2017a)). 

























264 


P. Padilla et al. 


Pitch class distribution for Schoenberg’s example 



C C# D D# E F F# G G# A A# B 

Pitch-class 


Fig. 4. Histograms of the pitch classes in Schoenberg’s example. 


Another useful graph to consider is the histograms not of pitch classes, but of 
intervals in a work. This is done in Figs. 5 and 6. They represent the proportions of 
intervals, rather than the pitches themselves, appearing in the fragments. These dis¬ 
tributions provide systematic information about the melodic contours that would be 
difficult to assess otherwise. 

From these histograms we see that in the Dorian cantus the preponderant interval is 
the descending second (either major or minor) and that intervals of a second account for 
approximately 80% of the intervallic material used. This is in agreement with the usual 
prescription for composing a cantus firmus, which should consist essentially of step¬ 
wise motion. Not only that, but the fact that melodic jumps upwards tend to be 
compensated by descending stepwise downwards, making the descending seconds the 
most common intervals in the histogram. This feature clearly distinguishes the frag¬ 
ment from the Schoenberg example in which the interval distribution is also much more 
uniform. We comment on these features not because they are surprising or new, but to 
illustrate the fact that even simple quantitative indicators might be helpful in distin¬ 
guishing stylistic characteristics. In fact, in stylometric measurements in literature, even 
the frequency of words used by different authors might serve as a basis for classifi¬ 
cation (Stamatatos 2009). 
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Interval distribution for cant u$ fimni in Dorian 



Finally, we explain what the transition distribution matrix of pitch classes is, since 
it is the basic element that will allow us to apply statistical analysis, generate graphs 
and use standard tools from graph theory to propose what might be called stylistic 
signatures. We consider any given note in a music fragment, let say E, in the cantus 
firmi example and look for the probability that it goes to a D, where we count the total 
number of transitions in the piece (intervals). We compute it by counting the number of 
times E is followed by D and divide by the total number of transitions. A graphic 
representation of these matrices is presented in Figs. 7 and 8. For the specific example 
this would be represented by the square on the E row (pitch class 1) and the D column 
(pitch class 2). It can bee seen at first glance that the one corresponding to the cantus is 
sparser than the other. Also the fact that elements in Fig. 7 are concentrated in the 
upper and lower diagonals is related to the preponderance of stepwise motion. How¬ 
ever, other relevant features such as the existence of a tonal center, D in this case, and 
its fundamental structural role can also be inferred from the graphical representation of 
the transitions. Of course such a representation can provide useful visual information, 
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Interval distribution for Sc oenberg's example 



Interval 

Fig. 6. Interval distribution for the intervals in Schoenberg’s example. 


but as such is limited (Fig. 9). However, the matrix itself as an array of numbers, 
provides the raw material for a more quantitative analysis that can include standard 
techniques 3 . For the sake of completeness, we include the corresponding matrix for the 
cantus firmi example as computed with Miditoolboox 4 . Notice the different row order, 
which is more conventional for a transition matrix. The empty spaces correspond to 0 
and are left blank for clarity. With the matrix for this example, we now proceed to 
construct the associated graph. We explain in detail the procedure for the cantus firmi. 
The other example is done in a completely analogous way. Each note can be taken as a 
node and every element of the matrix as an edge joining the two corresponding ele¬ 
ments. An empty space or 0 means that there is no connection between the notes. For 
instance, given that from D to E there is a nonzero element, an edge from the former to 
the latter note has to be drawn. After doing this for all the notes we obtain the graph 


3 For instance, singular value decomposition, spectral analysis, principal component analysis, etc. See 
(Knights et al. 2017a). For more details on statistical methods. 

4 The previous histograms for pitch clases and intervals were also generated using Miditoolbox 
(Tolviainen and Eerola 2016). 
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Pitch class transition for Dorian cantus fimni 



Pitch-class 2 



Fig. 7. Interval transition matrix for the Dorian cantus. 


shown in Fig. 10a. In standard graph theoretical terminology the transition matrix is 
called the adjacency matrix. Notice that in the figure neither the weight of the con¬ 
nection (the actual numerical value of the corresponding element), nor the direction of 
the transition is shown. As before, we point out a few interesting facts that can be 
observed right away. For instance, there are six nodes with no edges leading to or 
coming out of them (C, C#, D#, F#, G# and B) which except for the C# correspond to 
notes not belonging to the mode 5 . The nodes with a larger number of edges are D, G 
and A, which even without any harmonic consideration or reference to tonic, sub¬ 
dominant and dominant can be seen to play an important role. The notions of connected 
components (disjoint parts in which the graph can be naturally separated, 7 in this 
case), degree of a node (number of edges associated to a node), the indegree or 
outdegree (number of edges going into or out of a node respectively) and many other 
quantities can be computed in order to characterize a musical fragment in this way 
(Fig. 10b). 


C# enters the mode as a leading tone, although it does not belong to it in a strict sense. 


5 
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Pile h class transiti ons f or Sc h o en b erg’s exam pi e 
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A# B 
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10.05 


10.04 


■10.03 
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Fig. 8. Interval transitions matrix in Schoenberg’s example. 
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Fig. 9. Interval transition matrix for the Dorian cantus (numeric values). 
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Ds 


Fs 


Gs 


B 


Fig. 10a. Graph associated to the cantus firmi example. 


Cs 



Fig. 10b. Graph associated to the Schoenberg example. 


For the corresponding analysis in the Schoenberg example, we see that the nodes 
are much more interconnected and that the degree of a node does not vary much. 

Other less intuitive parameters include the clustering coefficient or the network 
centralization, which could be obtained using standard network packages 6 . Each one of 


In particular, we used Cytoscape in order to analyze the networks. This open source software has 
already a built in analyzer. 
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these estimators can be used to compare diverse musical fragments and to look for 
stylistic signatures of a given period or composer 7 . 

We are interested in further exploring other features that can shed light on the 
structure of a certain piece and that can provide criteria to compare different works, 
different styles and even different stylistic periods in the creative life of a composer. 
There are at least two important graph theoretical notions that can allow us to introduce 
hierarchies in a given piece, namely the modular structure (or community structure) of 
a graph and its motifs. The first concept, that is the modularity or community structure, 
is easy to grasp informally and actually not so easy to define precisely in mathematical 
terms. As its name suggests, a module or community of a graph is a part of it that has 
more connections between its nodes than with nodes not belonging to it. As a simple 
example consider the network shown in Fig. 11. 


Node 1 


Node 2 


Node 3 


Node 10 



Node 8 


Node 7 


Node 9 


Fig. 11. Simple network with a clear modularity structure. 


It can be observed at first sight that there are four different communities. Node 10 is 
in a class by itself. This can be seen if we take it out. The resulting network is 
disconnected in three similar subnetworks (nodes 1, 2, and 3, nodes 4, 5 and 6 and 
nodes 7, 8 and 9). So taking into account this we can talk of different communities at 
different hierarchies. At a zero level (distinguishing only connected components), a 
single community, the whole graph is obtained. At hierarchy level 1, a distinguished 
module (node 10) is detected and the other three already mentioned subnetworks (see 
Fig. 11). 


7 The same consideration can be applied to different performers, focusing on the rhthmic aspects, 
rather than the melodic ones, which in principle are fixed. Of course ornamentation aspects or 
improvised music can also be approached using these tools. 
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Fig. 12. Four motives of 5 and 4 nodes. 
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We can apply the same procedure to the Dorian cantus firmi example. This analysis 
was carried out with the same platform and in this case we obtain that D plays the central 
role. This procedure can be repeated with fragments of a piece or larger pieces in order to 
establish structural centers, without necessarily refer to a particular tonal idiom. 

Another important notion that can be very useful not only in analysis but also in 
automatic music generation (see the concluding section) is the graph theoretical con¬ 
cept of motif. Roughly speaking a motif, in this technical sense, is a small subgraph or 
unit that appears many times in the complete graph. “Small” is usually taken to mean 3 
or 4. For instance, referring back to the example in Fig. 11, we can say that the modules 
of size 3 are in this case also motifs, since they appear often in the graph. The larger the 
graph, the more complex the motif structure is. Moreover, associated to each motif 
there is a weight, which indicates how frequently it appears. We present in the final 
figures the motif structure for the Dorian cantus firmi example as obtained with the 
same platform and a special plugin for motif analysis for the most important motifs of 4 
and 5 nodes (Fig. 12). 

The importance of motifs in our approach is that they can be identified with basic 
units with meaning, in a similar way as words. As already pointed out, in music there 
are no clearly identifiable minimal structures and in traditional music analysis there is 
no systematic way of carrying out this identification or segmentation and in many cases 
this is done “by hand”. Our proposal is by no means unique, but provides a starting 
point for further analysis. Again, contrasting this with the example by Schoenberg, we 
provide the motif analysis of the fragment. Without elaborating more, the difference 
with the cantus firmi is clear (Fig. 13). 



Fig. 13. Motives of size 5 in Schoenberg example. 


As a final illustration we present a more realistic example. We consider all the 
Allemandes attributed to Louis Couperin and using the pitch class distributions for both 
the bass and the melodic lines we use several classification techniques ranging from 
principal components analysis to machine learning. We leave the discussion of the 
musicological implications of this example as well as the historical context and 
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attribution problem that motivated it for another paper (Knights et al. 2017a) and 
(Wilson). However, we mention that even at this level of generality the classification 
schemes correctly separate major from minor pieces as well as melodic and bass lines 
and allows for the identification of outliers (Fig. 15). 

The table in Fig. 14 presents all used pieces. The number refers to the catalogue 
number in A. Curtis edition (Curtis 1970). 


Piece 

Tonality 

Voice 

Tag 

Cluster 

Distance 

cluster data 

1-class SVM 

score 

Outlier 

score< 0 

CC 02* 

A minor 

Bass line 

CC 

02 

(ft) 

3 

50.7 

51.2 

0.581 

± 

0.029 




Melody 

CC 

02 

(m) 

1 

35.9 

37.8 

0.015 

± 

0.002 

* 

CC 08 

A minor 

Bass line 

CC 

08 

(ft) 

3 

17.4 

27.8 

0.581 

± 

0.029 




Melody 

CC 

08 

(m) 

1 

33.5 

29.1 

0.581 

± 

0.029 


CC 13 

A minor 

Bass line 

CC 

13 

(ft) 

3 

29.6 

45.5 

0.638 

± 

0.047 




Melody 

CC 

13 

(m) 

1 

12.0 

10.7 

1.229 

± 

0.039 


CC 14 

A minor 

Bass line 

CC 

14 

(ft) 

3 

47.9 

45.5 

0.806 

± 

0.045 




Melody 

CC 

14 

(m) 

1 

18.3 

13.7 

0.581 

± 

0.029 


CC 19 

B minor 

Bass line 

CC 

19 

(ft) 

3 

39.2 

40.5 

0.301 

± 

0.022 




Melody 

CC 

19 

(m) 

1 

16.7 

22.9 

0.755 

± 

0.045 


CC 23* 

C mayor 

Bass line 

CC 

23 

(B) 

4 

40.0 

30.0 

-0.036 

± 

0.006 

* 



Melody 

CC 

23 

(M) 

2 

16.0 

15.0 

1.008 

± 

0.045 


CC 34* 

C minor 

Bass line 

CC 

34 

(ft) 


65.2 

493.5 

-0.053 

± 

0.001 

* 



Melody 

CC 

34 

(m) 

1 

47.8 

50.5 

0.581 

± 

0.029 


CC 40* 

D mayor 

Bass line 

CC 

40 

(B) 

- 

105.2 

207.4 

-0.421 

± 

0.021 

* 



Melody 

CC 

40 

( M) 

2 

31.7 

24.4 

0.581 

± 

0.029 


CC 46 

D minor 

Bass line 

CC 

46 

(ft) 

3 

30.5 

27.9 

0.581 

± 

0.029 




Melody 

CC 

46 

(m) 

1 

21.2 

42.8 

0.581 

± 

0.029 


CC 65 

E minor 

Bass line 

CC 

65 

(ft) 

3 

28.2 

26.9 

0.719 

± 

0.044 




Melody 

CC 

65 

(m) 

1 

14.4 

19.4 

0.581 

± 

0.029 


CC 69 

F mayor 

Bass line 

CC 

69 

(■ B ) 

- 

56.3 

127.5 

0.581 

± 

0.029 




Melody 

CC 

69 

(M) 

2 

23.3 

18.9 

0.581 

± 

0.029 


CC 77 

F mayor 

Bass line 

CC 

77 

(B) 

3 

28.3 

40.1 

1.258 

± 

0.068 




Melody 

CC 

77 

(M) 

2 

17.7 

12.2 

0.581 

± 

0.029 


CC 86* 

G mayor 

Bass line 

CC 

86 

(B) 

4 

69.5 

30.0 

-0.152 

± 

0.012 

* 



Melody 

CC 

86 

(M) 

2 

33.6 

27.8 

0.177 

± 

0.002 


CC 93 

G minor 

Bass line 

CC 

93 

(ft) 

3 

22.3 

26.0 

0.749 

± 

0.045 




Melody 

CC 

93 

(m) 

1 

19.7 

18.5 

0.581 

± 

0.029 



Fig. 14. Allemandes from the A. Curtis edition of the keyboard music by L. Couperin. CC 
stands for Couperin and Curtis. The numbers are the same as in the edition. B and M stands for 
bass and melody in major keys and correspondingly b and m for minor keys. 
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Fig. 15. Classification techniques for the Allemandes mentioned in the text. 
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Outlier detection via 1-class SVM 




Fig. 15. (i continued ) 
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Fig. 15. ( continued) 


3 Conclusions 

Another aspect that has not been addressed in detail in this paper is the essential fact 
that in music there is always the possibility of analyzing or understanding a fragment in 
retrospective. That is a composer, interpreter or listener can make sense of the musical 
structure a posteriori. To some extent this is taken into account by the methodology 
proposed here, since there is some possible recursive elements embedded in the hier¬ 
archical analysis. However, it should be mentioned that a systematic incorporation of 
this fact is still to be developed. A possible approach that has already been pursued is 
application of hidden Markov chains or chains of higher order, in which the state of the 
system in the next iteration depends on a finite number of previous states. Still, 
probably the most difficult feature of musical structure to be captured is the extremely 
complex interrelationship between melodic, rhythmic and harmonic elements in a 
recursive and hierarchical way that a normal listener seems to be able to perform in the 
most natural way. 

This leads to the question of how we actually listen to music and make sense of 
musical structure. Such a question in turn incorporates cognitive and physiological 
aspects that are the subject of very active current research. As pointed out in the 
introduction, the case of the Couperin brothers provides us with an ideal case study to 
which apply the methodology presented here and this is the subject of a forthcoming 
work (Knights et al. 2017a). We also refer to (Knights et al. 2017b) for an application 
to elucidating joint attribution to Taverner and Tye of the motet O splendor gloriae. 
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However, there are countless possibilities of application. Even in the work of a com¬ 
poser like J. S. Bach, that has been thoroughly studied, stylistic issues are still far from 
being settled (Jones 2015). 

Finally, we also mention that the extraction of modules and motifs provides a 
starting point to generate music in a standard way. Once a motif structure has been 
defined, one can use simulation techniques to generate patterns with similar charac¬ 
teristics (see (Hardoon et al. 2014) for a recent work in which this technique is used. 
We would like to stress the fact that this methodology can provide the basic recom¬ 
bination material that was proposed by Cope, which we referred to in the introduction. 
We also leave for a later work the more detailed study of automatic music generation 
using these techniques, since it constitutes a subject by itself. 
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Abstract. Symbolic melodic similarity aims to evaluate the degree of 
likeness of two or more sequences of notes. In this work, we propose the 
use of fuzzy c-means clustering as a tool for the measurement of the sim¬ 
ilarity between two melodies with a different number of notes. Moreover, 
we present an algorithm, FOCM, implemented in a computer program 
written in CJJ able to read two melodies from files with MusicXML for¬ 
mat and to perform the clustering to calculate the dissimilarity between 
any two melodies. In addition, for each iteration step in the conver¬ 
gence process of the algorithm, a family of intermediate states (transi¬ 
tion melodies) are obtained that can be used as new thematic material. 
This last feature, could be especially useful in the near future, as a com¬ 
plement in computer-aided composition. 


Keywords: Fuzzy clustering • Symbolic melodic similarity 
Computer-aided composition 


1 Introduction 

Symbolic melodic similarity is fundamental in the field of computer-aided com¬ 
position [1,13]. The measure of the similarity/dissimilarity between melodies is 
a key factor both in defining transitions between two different melodies and to 
generate new melodic material from an already preexisting melody [11]. In this 
paper, we propose a procedure to measure the similarity between two melodies 
by using an algorithm based on fuzzy clustering. 

Our starting point will be the characterization of musical notes as points in 
a metric space, where the coordinates represents musical characteristics. In this 
way, a melody would be an ordered sequence of notes. Measuring the dissimilarity 
between two monophonic melodies of equal number of notes can be made by 
comparing, one by one, each note of the first melody with a note of the same order 
in the second. However, for our purpose we would need to be able to establish 
a generic comparison mechanism allowing the measurement of the dissimilarity 
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of two (monophonic or polyphonic) melodies of different number of notes. For 
this, we will use an algorithm based on fuzzy c-means clustering. 

The purpose of that clustering would be to establish to what extent the notes 
of a first melody are related to the notes of a second one. The purpose of that 
clustering would be to establish to what extent the notes of two different melodies 
are related. After the clustering, we will be able to calculate a global difference 
between the melodies (dissimilarity) aggregating the partial distances weighted 
by their corresponding membership coefficient. In the fuzzy logic context, the 
membership functions are the extension of the characteristic set functions [16]. 
While characteristic functions take values 0 or 1, membership functions can take 
any value between 0 and 1. Therefore, the membership coefficients express the 
membership degree of an element to a cluster [15,16]. 

Subsequently, in the comparison of the general dissimilarity, we will take into 
account the order of the notes in each melodic sequence. For this, we will use 
neighborhood functions. These functions will allow us to define a comparison 
in which the clustering of the notes is influenced by their position within the 
sequence defining the melody. 

In order to verify the utility of our proposal, we present an algorithm, FOCM, 
implemented in a computer program written in Cjj. This algorithm will allow us 
to read two melodies from files in MusicXML format and to perform the clus¬ 
tering to calculate the dissimilarity between them. In addition, for each itera¬ 
tion step in the convergence process of the algorithm, a family of intermediate 
melodies will be obtained that can be used as new thematic material. This last 
feature could be especially useful in the near future, as an aid in computer com¬ 
position. For this reason, in the last section we provide an example, in which 
the number of intermediate melodies created by our method is shown. As an 
instance, we present one of the intermediate melodies obtained from the mea¬ 
sure of dissimilarity between two passages. 

2 Preliminary Concepts 

A musical note determined by k characteristics (picht, intensity, duration, tim¬ 
bre, etc.) can be expressed as a vector in M 9 , where q < d. Of course, each 
characteristic does not have to correspond to a single coordinate. For example, 
in [7,8] pitch is defined as a fuzzy set [16], 

P={(f,»p(f)), /e[fo,fi]}, (l) 

where / represents the frequency in Hz. and pp(f) G [0,1] is the membership 
degree of / to a note in a given tunning system. In this case, the fuzzy pitch 
would be given by two coordinates (/, fip(f)). 

The most simple way to represent a musical note is by setting q = 2, the 
pitch and the duration, and establishing two bijections from the pitch and the 
duration of the note x G M 2 . However, it is possible to work with a higher 
number of dimensions in order to represent more accurately the characteristics 
of music. New properties belonging to the requirements of other kinds of music 
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or styles [14], like Non Western-tradition music , Computer-Generated Music or 
Electroacoustics could be easily assimilated as extra dimensions. 

As usual, the distance in cents between two notes whose frequencies are /i 
and /2 can be easily calculated in cents [3,8] by means of the expression 


d(fi,h) = 1200 x 



cents. 


( 2 ) 


According to [12], the MIDI protocol defines a midi-pitch of a note by a 
integer number comprised on a range [0,127], being central C4 = 60 and reference 
A4 = 69. For the equal temperament of 12 notes [3], there are 100 cents of 
difference between two notes separated by one midi-pitch number (semitone). 
If a concert pitch frequency /a 4 (usually 440 Hz) corresponds to the midi-pitch 
number 69 then, the midi-pitch number of a frequency /, is 


v = 69 + 12 



(3) 


Taking the figure of the whole note as the unit, it is easy to define a note’s 
duration coefficient 6 G M. A half note has a coefficient 1/2, a quarter note 1/4, 
a quaver 1/8, etc., i.e. 

a=—, —1 < a < 7. (4) 

2 a w 

In addition, each dotted note multiplies its duration by the factor 


6 1 

k =0 



(5) 


where b is the number of dots of the note. On the other hand, tuplets (described 
in [2] as reading c notes in the space of d) are notated by the expression c : d, 
and modify the duration of each note with the factor 

d 

7= c 

Taking into account expressions (4), (5) and (6), a number of r tied notes 
will have a duration 


5 = XM* • a • 7*) - Yi 

i =1 i =1 



2 bi + l - 1 di 

2 bi Ci 


(7) 


Once the concept of musical note has been defined we can express a melody 
as an ordered sequence of n notes, being each note of the melody a point in a 
metric space. 

Definition 1 . A melody is a sequence, where each G M 9 is a 

musical note. 
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For instance, let us consider the melody in Fig. 1. If we were only interested 
in the duration S and in the midi-pitch number v of each note, the fragment 
could be expressed as the following sequence of 14 notes: 

= {(<5i,^)}}=i = {(0.041667, 67), (0.041667, 69), (0.041667, 70), (0.166667, 69), 
(0.166667, 67), (0.166667, 70), (0.250000, 72), (0.041667, 69), (0.041667, 72) 
(0.041667, 70), (0.125000, 69), (0.187500, 67), (0.062500, 65), (0.750000, 67) }. 

If the melody is polyphonic, as in Fig. 2, the notes’ pitch is represented by 
a vector P G R fe , where k is the least common multiple of the number of voices 
appearing in the melody. In Fig. 2 notes with 1, 2 and 3 voices appear, then k = 
l.c.m.(l, 2,3) = 6. Consequently, if our only interest are the duration and pitch 
of the notes, these could be expressed by using 7 coordinates. The corresponding 
melody would be ,# 2 - 

J ( 2 = {{5i,Pi )} 5 i= i = {(0.125; 67, 67, 67, 71, 71, 71), (0.0625; 64, 64, 64, 67, 67, 67), 
(0.0625; 67, 67, 67, 71, 71, 71), (0.125; 64, 64, 67, 67, 71, 71), 
(0.125; 71, 71, 71, 71, 71, 71)}. 



Fig. 1 . Example of a melodic line to be represented into the plane duration-pitch. 


J = 92 


—Jr~n—~\ 

- 1 



-H 

- H - 


-i 



El 

- J - 


Fig. 2. Example of polyphonic melody. 


3 Comparison of Melodies 

If we consider two melodies and , both belonging to a g-dimensional 
metric space, it is only possible to measure a well-defined distance between them 
if they have the same number of notes. This does not necessarily mean that both 
melodic lines have the same duration expressed in units of time; however, they 
need to have the same number of points. 

In order to compare those melodies we will first calculate a total distance 
by accumulating the partial distance between each couple of notes Xi, yi , i > 1, 
respecting the established order of the sequence of the points of both melodies. 
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Definition 2. Let A = {xi,..., x n } and B = {yi,..., y n } be two melo¬ 
dies with n notes, belonging to a q-dimensional space R q . Given d : R q x W —> M 
a distance function, the distance between a and b can be defined as 

D(JZ a ,JZ b ) = ^{d(xi,yi),.. .,d(x n ,y n )}, (8) 


where & is a prefixed aggregation operator [15]. 

In this work, until the contrary is noticed, we will use as & operator the 
arithmetic mean, that is 

D(,£ a , Jt B ) = d(xi, yi). (9) 

i =1 

Nevertheless, regardless the operator chosen, it is easy to verify the following 
result: 

Proposition 1. Assuming the previous notation, the following inequalities hold 

mind(xi,yi) < D (^ A , ^ B ) < maxd(xi, yi). (10) 

i i 

As well known [3], the Weber-Fechner Law approximates the psychological 
rules of human perception of intensity or pitch. This idea can be easily incor¬ 
porated this into the calculation of the distance. For example, if we express 
the notes x with three coordinates (xi,X 2 ,xs) representing duration, pitch and 
intensity, respectively, in [9] the following distance is used 

d(x,y) =a-\x 1 -y 1 \+(3- | log(x 2 /y 2 )| + 7 • | log(ar 3 /2/ 3 ) |, (11) 

where a, /? and 7 are some prefixed constant values. 

If we want to compare two melodies with different number of notes, Definition 2 
has to be generalized. In fact, in the symbolic melody similarity literature it is 
possible to find several examples in which some definitions of distance between 
two melodies of different length are defined [5, 10]. The objective of many of these 
works is to approximate as much as possible to human perception [11]. With this 
aim, different techniques have been proposed ranging from the geometric structure 
of the melodies [1] to fuzzy logic [11], for instance. 

A definition of an average distance based on the clustering of two melodies 

A = {xi,...,x n } and B = {yi,...,y m }, being n > m, allows us to 
estimate how far away melody A is from melody . Despite the fact that 
this measurement will not satisfy the requirements of a distance function, the 
result provides some useful information about to the degree of similarity between 
these two melodies. 

When a classical clustering process, e.g. c-means clustering , is applied to a 
general data set X of information, the result is a Boolean partition of X into 
c clusters, so each element of X belongs only to one cluster. Related to the 
comparison of melodies, we can use this procedure to cluster the set of n notes 
of melody A into m subsets. Once this is finished, we will be able to associate 
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each subset in ^ A to a note in Jt B and finally, calculate an average distance 
from every point of each subset in to its corresponding note in <M B . 

The global dissimilarity of the two melodies would be calculated by aggregat¬ 
ing the partial average distance. However, while carrying out with this procedure 
we have to accept two arguable assertions: 

1. It is assumed that comparing each note of only to one note of Jt B has 
musical sense. 

2. In the process of comparing notes the order information is omitted. This is a 
key question in musical terms. 

In what follows, a new proposal based on fuzzy logic will be presented. Real 
features of musical fact can be better represented by this new approach. With this 
objective, we will use fuzzy clustering applied to the calculation of a dissimilarity 
measure between two melodies of different number of notes. 


3.1 Fuzzy C-Means Clustering (FCM) 

The fuzzy c-means is a clustering method initially developed by Dunn [6] in 1973, 
based on the statement that any element of a given set is able to belong to more 
than one cluster. Thus, the fuzzy clustering method will provide a membership 
function that describes the belonging degree of each element to any centroid. As 
it is explained in [4] , the generalization of fuzzy c-means algorithms comes from 
the iterative minimization of an objective functional. 

Definition 3. Let the data set X = {xi,X 2 ,... ,x n } C MT Let v be a set of 
cluster centers v = (vi, v 2 ,..., v m ), with Vi G R q and m < n. Fuzzy c-means 
functionals are defined as 

n m 

j a = EE^) a (^) 2 > ( 12 ) 

*= 1 3 = 1 

where d 2 - =|| Xi — vj \\ 2 , being || • || any inner product induced norm on M 9 , 
A G [1, oo) is the weighting exponent (degree of fuzzyness of the process), and Uij 
is the membership coefficient of xi to the cluster j . 

The fuzzy clustering is achieved through an iterative optimization of J \, 
updating, at each iteration, the membership coefficients as well as the cluster 
centers Vj by using the following expressions 


£- A 


Ui q - 


£ 

k=l L 




Vq = 




£<■ 

i=1 


(13) 


The matrix U is a fuzzy partition of X , formed by the membership 


coefficients Ujq 


Uij ~ 


( * * * U\ri 

* * * Hn 


(14) 
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As convergence condition of any fuzzy clustering we have 

rri 

u ij = 1, 1 < i < n. (15) 

j=1 

3.2 Fuzzy C-Means Algorithm 

In what follows we will show the implementation of the Fuzzy c-Means Clustering 
Algorithm proposed by Bezdek in [4]. 

FCM-Algorithm 

STEP 1. Fix a number of clusters ra, 2 < m < n. Choose any inner product 
norm metric for fix A, 1 < A < oo. Initialize U^°\ 

STEP 2. Calculate the fuzzy cluster centers { v ^} with and 

expression (13). 

STEP 3. Update using expression (13) and {v^}. 

STEP 4. Compare to using a convenient matrix norm, being 

e E (0,1) and arbitrary termination criterion. If || ||< 

e then stop, otherwise set k = k + 1 and return to step 2 . 

4 Measuring Dissimilarity by Means of Fuzzy Clusters 

Let us consider two melodies ^ A and Jt B with different number of notes. 
We will now make a fuzzy partition of the notes from with the initial 
cluster centers given by ^ B , and apply the FCM algorithm k times until the 
termination criterion is satisfied. Once the partition process is complete, we 
can define a dissimilarity function between Jt A and Jt B by using the final 
membership coefficients and the original cluster centers. 

Definition 4. Let Ji A = {xi,... ,x n } C R q and Ji B = {yi,... ,y m } C R q be 
two melodies, where n > m. Let d : W x W —> M be a distance function. Let 
be the final membership coefficients calculated with FCM algorithm. The average 
dissimilarity & from A to B is defined by 

n m 

3>(JZ a ,JZ b ) = -VVtiy^Xi.yj). (16) 

n • m A ^^ 

z=i j =l 

By construction, does not consider the natural order of the sequence of 
notes within each melody. Thus, the partition that FCM algorithms calculate 
does not weight in any special way the notes whose degree of neighbourhood is 
stronger. As an illustrative example of this fact, in Fig. 3 it is possible to see 
three different melodies. Since Melody B is a complete retrogradation of Melody 
A, average dissimilarity between melodies A and C have exactly the same 
value than average dissimilarity between melodies B and C, 
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3>(JZ A , Jt c ) = 0.23354, ^(^ B , Jt c ) = 0.23354. 


This example shows that a comparison of different melodies without taking 
into account the order of the notes does not completely reflect musical reality. 
To avoid this, we will introduce a dependence with the order in the algorithm. 
In this way, higher weights will be given to the pair of notes that share closer 
positions in the order of each melody, reducing the contribution to the global 
dissimilarity of the pair of notes that are far away from an ordinal point of view. 
Neighbourhood functions will provide the information related to the order in 
which the pair of notes must be compared. 

Definition 5. A continuous function f : M 2 —> M is a neighbourhood function 
between two melodies if 

n 

f(i,j)di< oo, Vj G {1,2 ,...,to}, (17) 

where n is the number of notes of ,/// A and m the number of notes of./// n . 


If a correct setting for the neighbourhood function is defined, neighbourhood 
values of i G (j — s,j + e) will be assigned to higher coefficients and the rest of 
values will be assigned lower coefficients. 

The procedure will be following: Once the fuzzy partition U has been calcu¬ 
lated, we will assign a weight to any element Ufj by means of a specific neigh¬ 
bourhood function f(i,j). In order to accomplish with the FCM convergence 
criterion, we will normalize U as follows 


Uij — 


Ujj • 

m 

E u ik ■ f(i, k) 

k=1 


(18) 


Example 1. Gaussian neighbourhood function 


fc{i,j) = Ae •’' 1 > 2 1'' 1 


(n —1)0 —1)1 

(m-1) J 


(19) 


In this function it is easy to see how the original /r value has been replaced by 
1 — expression has been obtained from the equation of a line 
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/i = f(j ) that crosses through points (1,1) and (ra,n). Given the fixed values 
n, m G N, the shape of /(i, j ) will change for each pair of values i, j. When j = 1, 
the Gaussian will be centered at i = 1 , but when j = m it will be centered on 
i = n. 

Our proposal is to modify the algorithm FCM in such a way that the order of 
the sequences of the notes A = {xi,..., x n } and B = {yi,..., y m }, n < ra, 
is taken into account. With this objective we propose the following algorithm, 
named fuzzy ordered c-means (FOCM). 

4.1 FOCM-Algorithm 

STEP 1. Set = {yj}- Let ra,n be the number of notes of ^ B and 

, respectively. Choose any convenient neighbourhood function. 
STEP 2. Choose any inner product norm metric for M 9 , and fix A > 1. 
Calculate the initial using (13), (18) and 

STEP 3. Calculate the fuzzy cluster centers with and the 

equation (13). 

STEP 4. Update using the Eqs. (13), (18) and {u^}. 

STEP 5. Compare to U using a convenient matrix nornq being 
e G (0,1) and arbitrary termination criterion. If || [/( /c + 1 ) — jj( k ) ||< 
e then stop; otherwise set k = k + 1 and return to STEP 3. 

Once the melodies have been compared by taking into account all the 
described characteristics, we can establish the following definition. 

Definition 6 . Let — {xi,... ,x n } G R q and — {yi,... ,y m } G R q be 

two melodies of different number of notes. Let d : W x W —>• M be a distance 
function. Letuij be the final membership coefficients calculated with FOCM algo¬ 
rithm. The average ordered dissimilarity S) from ^ A to ^ B is defined by 

n m 

= ■ — 0 - ( 20 ) 

71 ' m i=l 3=1 

In what follows we show the utility of expression (20). For this, we will 
calculate the dissimilarity between different melodies. 


4.2 Computational Examples 

Example 2. We are now going to compare the melodies appearing in Fig. 4 using 
expressions (16) and (20). 

The dissimilarity values are S>(^ A ,^ B ) = 0.68938 and @(^ A ,^ B ) = 
3.93483. The reason behind the disparity in the obtained results is that when 
the order of the notes is not taken into account, we use sharp notes in A 
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are compared with sharp notes in and flat notes in are compared 

with flat notes in <M B . In fact, when we give importance to the order, 3), both 
melodies are quite different (dissimilarity is almost six times greater with 3> 
than with 3>\ In Fig. 5 we show a screenshot appearing in our implementation 
of algorithm FOCM. We can observe how, for example, the two first notes in ^ B 
are associated to the first time measure in showing the above mentioned 

differences. 



Melody B 


Slow 


D * to k 1 



A: 1 “T!□ 5TT — 1 

rM_ R ___ m __#__k_ 3 1_1 



1 


Fig. 4. Melodies of Example 1. 



Fig. 5. Final result of clustering algorithm FOCM of A and . 

Example 3. Using the melodies displayed in Fig. 6, we will now show the capacity 
of our proposal to measure the dissimilarity in polyphonic melodies. 



Fig. 6. Melodies of Example 2. 
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The obtained values are = 0.30695, = 0.46356. 

As it was expected, melodies in this example are more similar than melodies 
in Example 1 and the differences between them increase when one of them is 
polyphonic and the other is not. 

Example 4- In Tablet we provide an example showing how our proposal func¬ 
tions. We have selected four passages of very well-known musical works: (1) = 
W.A. Mozart. Symphony No. 40. First movement. Measures 1-4, (2) = L.V. 
Beethoven. Symphony No. 6. First movement. Measures 1-4, (3) = J. Brahms. 
Symphony No. 3. Second movement. Measures 1-8, and (4) = B. Bartok. Music 
for strings, percussion and celesta. First movement. Measures 1-4. 


Table 1. Measurement of dissimilarity between melodies from Example 4. 


J( A 

j( b 

9 

((States 
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10 
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12 
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0.6642747436 

194 

0.6666523284 

353 

3 

4 

0.0692844111 

51 

0.1451223554 

429 

0.1383288250 

310 


In Tablet we display the dissimilarity measures between melodies from 
Example 4, as well as the number of intermediate compositions (((States) gener¬ 
ated by the algorithm with different values of the fuzzy coefficient used in the 
FOCM (Fig. 7). 




Fig. 7. One of the intermediate melodies obtained by the algorithm when measuring 
the dissimilarity between passages (1) and (3) from Example 4. 


5 Conclusions 

The evaluation of the degree of likeness of two melodies is nowadays a topic of 
great interest. By comparing the similarity of different melodies it is possible to 
find patterns, to extract rules and to identify structures, all key questions in the 
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study of musical styles. In this work, we have proposed a fuzzy logic tool, fuzzy 
c-means clustering, for the measurement of the similarity between two melodies 
with different number of notes. 

The proposed FOCM algorithm allow us to define a measurement of the sym¬ 
bolic melodic dissimilarity between two different melodies, taking into account 
the order of the sequences of notes that each melody contains. To a certain 
extend, the definition of fuzzy c-means average ordered dissimilarity offers a 
geometric way to compare very different melodic lines that can be used with 
several purposes, like classification of melodies, computer-aided composition or 
musical-styles recognition. 

Our proposal could also be applied to other fields of research in which is 
necessary to estimate the degree of closeness of two different sequences of ordered 
information. 
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Abstract. In this article, we look at the diachronic changes in tango 
harmony with the methods of network science. We are able to detect some 
significant tendencies of harmonic discourse in the first half of the 20th 
century, among them an enrichment of harmonic transitions and power 
law frequency distribution of triadic chords with exponents compatible 
with a quite small rate of accretion of the vocabulary. 


1 Introduction 

Tango is undoubtedly the most transcendent collective cultural creation of the 
Rfo de la Plata region. Several texts give account of its history, spanning from 
the last decades of the 19th century to the present day [1,2]. In spite of this, to 
the extent of our knowledge, no computational musicological study has focused 
specifically on tango and its diachronic evolution. 

The availability of big corpora of musical data has fostered quantitative evo¬ 
lutionary studies on American popular music [3], jazz harmony [4], electronic art 
music [5], musical influence of songs [6], to mention some examples. Recently, 
complex networks methods have been employed to analyse pitch and timbre 
transitions both in individual works [7], and large collections [8,9]. 

We consider chord transition networks built from sampling whole decades 
of a corpus of tango recordings. To this end, we assembled a database of 510 
recordings of tangos, composed between 1910 and 1960, by downloading all the 
tangos from the Web archive Todo Tango [10], and discarding those that con¬ 
tained extramusical elements such as speech or clapping. Some of the recordings 
were denoised using Adobe Audition. In case several recordings of the same 
tango were available, we preferred the one with the earliest recording date. The 
median number of years between composition and recording is 0. 

We built different dictionaries of pitch class chroma chords, which became 
the networks’ nodes, as follows: we extracted the chromagram with Mirtoolbox 
for Matlab [11], using a frame size of 0.2 s without overlapping, keeping only the 
chroma with energy level above the average over all files, and circularly shifted 
them according to an estimation of the tonality of each tango to transpose them 
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to the tonality of C, in order to have a common tonal framework. Borrowing the 
terminology of [8] , we call the resulting chroma vectors codewords. 

The links of our pitch networks represent harmonic transitions between these 
codewords. Specifically, for the purpose of studying the evolution of harmonic 
discourse, we formed 5 collections of codewords, one for each decade in the 
year span 1910-1960. Two codewords are connected by a directed edge if they 
appear in consecutive analysis frames. In this way we are left with 5 networks 
corresponding to the periods 1910-1919 (16 tangos in the collection), 1920-1929 
(176 tangos), 1930-1939 (135 tangos), 1940-1949 (139 tangos) and 1950-1959 
(64 tangos). 

Proceeding in this way, many of the generated codewords do not correspond 
to the standard harmonic vocabulary of Western tonal music: beyond usual 
triadic chords, all kinds of chromatic harmonies, including the 12-note chromatic 
cluster, are obtained. For this reason, we considered two kinds of networks: 

a Unfiltered networks, containing all codewords. 

b Triadic networks, that is, filtered networks generated by only the triadic code¬ 
words of at most 4 chroma classes, including single chromas and dyads. We 
call a codeword triadic if it can be obtained, modulo 12, from one (or more) 
of its pitches by stacking consecutive minor or major thirds over it (or them). 
In these reduced networks, two triadic codewords are connected if the sec¬ 
ond chord is the next triad appearing after the first, ignoring non-triadic 
codewords in between. In this way we aim to representing a core harmonic 
skeleton, ignoring noisy frames and non-triadic chords arising from passing 
notes. 


2 Results 

Based on the results of Serra et al. [8] and the models of vocabulary frequency of 
[15], we essayed fitting the frequencies of codewords, sorted in decreasing order 
(that is, ordered by rank r, where r = 1 for the most frequent codeword and 
so forth), with a Zipf law of the form z = Cr~ a . For our fitting procedure, we 
used the approach of Clauset et al. [13,14]. In the case of unfiltered networks, 
we found, for all decades, nice fits with truncated power laws (see Fig. la). The 
scaling exponents obtained vary very little over the years, ranging from a = 1.81 
to a = 1.94. They are larger than those found in [8], pointing to a comparatively 
more compact and less innovative vocabulary [15], a fact which is to be expected 
since the corpus of Serra et al. is much more varied and massive, consisting on a 
million themes of popular music of many different genres. These exponents are 
also somewhat smaller than those found in [15] for the distribution of notes in 
classical music. 

For triadic networks, however, we did not find good fits with pure truncated 
power laws. One reason for this could he in the limited vocabulary considered here. 
A more appropriate model in this case is a shifted power law z = (a + 6r) _a , with 
coefficients adjusted to the vocabulary size. This law is derived partly from the 
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Fig. 1 . (a) Complementary cumulative distribution of codeword frequencies and their 
fits by power laws for unfiltered networks. Curves are chronologically shifted by a factor 
of 10 in the vertical axis for ease of visualization, (b) Rank-frequency distribution of 
normalized codeword frequencies (respect to maximum frequency) and their fits by 
shifted power laws for triadic networks. Curves are chronologically shifted by 1 in the 
vertical axis. 


hypothesis that, as the musical corpus grows in time, the frequency of harmonic 
innovations goes as a power 1/a of the pre-existing language size [15]. 

We found nice fits of this model with triadic codewords frequencies, with 
exponents now varying between 2.48 and 6.05 (Fig. lb). A tentative explanation 
of the unusually large exponents, in the context of the aforementioned Zipfian 
shifted power law, is that there is a very slow innovation rate going on in the 
basic triadic vocabulary as we consider the whole collection of tangos from a 
given decade (hence very small innovation exponent 1/a), and that the changes 
occur, instead, mostly at the level of nontriadic chords. In order to see if the 
codeword ranking remains stable across the years, we compute the Spearman 
rank correlation coefficients of triadic codewords for all pairs of decades. They are 
all high, with a minimum of 0.81. So frequent codewords continue to be so along 
the history of tango. Tracking the relative frequencies of each triadic chord type 
between 1910 and 1960, we observe some steady changes: augmented triads, half 
diminished sevenths have a twofold increase, minor sevenths also grow, although 
in lesser proportion; there is a small transitory drop in minor triads in 1920-1930 
while major triads show a long term falling tendency (Fig. 2). 

Beyond codeword frequencies, harmonic networks give us a panorama of how 
musical discourse transits between the elements of the vocabulary of codewords, 
creating stylistic patterns that can be learnt by repeated listening experiences 
and subsequently lead to the formation of expectancies and surprise [16,17]. 
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Fig. 2. Evolution of relative frequencies of triadic chord types. 


Usual network measures and metrics can be easily interpreted in our context in 
terms of their musical meaning. In the following, we consider several such typical 
network coefficients 

Density is defined as the fraction of edges present, compared with all possible 
n{n — l)/2 edges (where n is the number of nodes of the network). All our 
harmonic networks are sparse in this sense. For triadic networks density is at 
most 0.21, while unfiltered networks are much sparser, with densities below 0.006. 
Phrased in terms of predictability, this sparseness makes accessible the statistical 
learning of transition rules, involving around 2000 different transitions between 
the 140 possible triads. 

Degrees. Node out-degree k is the number of neighbors following a codeword. 
For unfiltered networks, degree distribution is nicely fit with a truncated power 
law P(k ) = k-i for k > for the period 1920-1929, with exponent 2.42, 
while in the other periods a better fit is a truncated power law with exponential 
cutoff, with exponents in the interval [1.93, 2.12]. These values are similar to 
those obtained by Serra et al. [8]. For triadic networks, also good fits are obtained 
with truncated shifted power laws, with exponents ranging from 2.92 to 6.17. 
While in unfiltered networks the median degree varies little between 2 and 6, for 
the triadic ones there is a big increase of degree connectivity from a median of 
5 in 1910-1919 to values above 19 in the decades from 1920 to 1950, dropping 
somewhat in 1950-1959 to 13. This indicates a strong tendency towards greater 
freedom of harmonic discourse, and is also correlative with an increase of the 
size of the vocabulary, from 122 codewords in 1910-1919 to 139 codewords in 
1920-1929, with a gradual and small decay to 133 nodes in the ‘50s. (Note that 
the total number of possible triadic codewords is 151). 
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From now on, we focus on networks of triads, where results are more easily 
interpreted in the framework of classical harmonic analysis. Codeword frequency 
and codeword degree are almost perfectly monotonically correlated, with Spear¬ 
man rank coefficients above 0.99 for all decades. So the most frequent chords, 
among which there are the main triads defining tonality, are also the most con¬ 
nected. A notable symmetry emerges here, that also has been observed by Serra 
et al. [8] Contrasting the out-degree of the major and minor triads over all 
chromatic scale degrees (in the musical sense of the word), with their similarly 
defined in-degree (the number of different chords that lead to a given one), their 
values are extremely similar, with their mean ratio over all triads between 0.99 
and 1.02, and standard deviations between 0.01 and 0.1, for all decades. 

Clustering measures the transitivity of the network. The local clustering coef¬ 
ficient ci = k {k-i) gives the number of closed triangles among the nodes con¬ 
nected to node i. Harmonically, if we interpret the network as giving the stylis¬ 
tically permissible chord transitions, a high q implies that a transition between 
codeword i and another codeword that could be done directly, also could often 
be realized with an intermediate linking chord, adding to variety of harmonic 
conduction. Here we measure local connectivity by ( 7 , the average of q. A global 
measure of connectivity is the average shortest path length l. This gives the 
average of the minimum number of intermediate chords that are necessary to go 
between two given codewords. For instance, the appearance of bold and abrupt 
harmonic progressions that link tonally distant chords side by side would tend 
to reduce the value of l. High levels of clustering and small values of l define a 
small-world network [18]. Finally, assortativity by degree r is a coefficient mea¬ 
suring the tendency of nodes with similar degree to connect to each other. It 
is positive if this effectively occurs and negative if nodes of high degree tend to 
connect with nodes of low degree and vice versa. To interpret these coefficients, 
they are to be compared with the same coefficients computed from a random 
network with the same degree distribution, which we construct with the rewiring 
method described in [19]. For our networks, there is a marked increase of C (the 
average of q) from a value of 0.35 in 1910-1919 to values in the range [0.47, 
0.58] for the following decades. Corresponding random networks have clustering 
coefficients in [0.1, 0.18]. At the same time, l decreases from 2.57 to 1.86 between 
the first two decades, and then slightly increases to 2.11 in the ’50s; these values 
are smaller than the ones obtained by randomizing links. So globally, the small¬ 
worldness increases along time, implying a trend towards relatively more rich 
and daring harmonic progressions, with more different choices and also shorter 
ways to go from a chord to another (Fig. 3). Assortativity remains negative, 
in the range [—0.09, —0.21], with a slight increase to —0.17 in the ’50s. Keep¬ 
ing in mind the direct correspondence between frequent and connected chords, 
this means an increasing tendency to avoid direct connections between the most 
common triads. However, while assortativity values are more negative than ran¬ 
dom in 1920-1950, they are less negative than for the randomized networks in 
1910-1919 and 1950-1959. 
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Fig. 3. Average shortest path length l versus clustering coefficients for actual (triangles) 
and randomized (squares) triadic networks. 


3 Conclusions 

Looking at tango from the network science perspective, we are able to single out 
some clear trends in tango evolution, and to compare them with the changes 
in other genres of music described in [8]. Tango appears to have a relatively 
limited harmonic vocabulary (even if we consider unfiltered networks), and data 
are compatible with an innovation model exhibiting a slow rate of appearance of 
novelties. But in the period considered here, inversely to the tendencies shown 
in [8], progressively richer and more complex chord transitions emerged within 
this universe, which increased its small world features. 
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Abstract. In this paper we propose the systemic modeling of Camargo 
Guarnieri’s Ponteio No A with the aim of identifying a hypothetical com¬ 
positional system that gave rise to this work. From this compositional 
system we will plan a new work for woodwind trio. The model, specifi¬ 
cally related to the harmonic syntax and the melodic gestures, is encoded 
into two algorithms written in Python and MATLAB. 


Keywords: Systemic modeling • Compositional system 
Compositional planning • Guarnieri’s Ponteios 


1 Introduction 

This paper describes the methodological procedures for the systemic modeling of 
Ponteio No A, from the First Book of Ponteios, by Brazilian composer Camargo 
Guarnieri(1907-1993). The purpose of this methodology is to propose a hypo¬ 
thetical compositional system that gave rise to this work, such that, from this 
system it is possible to plan and compose a new work for an instrumental set 
distinct from the original one (piano). The modeling will be achieved with the 
use of a technique we call parametric generalization, which is defined below and 
explained in more detail during the course of the modeling itself. Initially, we will 
formally define the terms systemic modeling, compositional system, and para¬ 
metric generalization, and then accomplish the systemic modeling of Guarnieri’s 
Ponteio No A. From the resulting system of this modeling, we will plan and com¬ 
pose a piece for woodwind trio (oboe, clarinet, and bassoon). The results of the 
modeling will provide data for the creation of two computational algorithms in 
MATLAB and Python, which have the function of generating materials within 
the scope of the pitch parameter with the same profile as the original work. 

2 The Fundamentals of Systemic Modeling 

A model is a “simplified representation of a real system with the aim of studying 
this system” [1]. In the field of engineering, modeling can offer a physical model 
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and a mathematical model that represents the characteristics of a system with 
high accuracy, in order to test its operating conditions and in particular its limits. 
In the musical domain, the model of a particular work may be proposed from 
analytical tools that can describe its structural relationships. However, from a 
compositional perspective, it is not our interest to propose a comprehensive and 
multidimensional modeling from which one can replicate the original work in 
all its aspects, since it is not our intention to rebuild the analyzed work, but to 
build a different one, which only keeps certain degree of kinship with the original. 
Thus, systemic modeling is intentionally partial and only focuses on some aspects 
of a work. The degree of relationship between the analyzed work, which can be 
considered in some respects an intertext, and the new work happens in the realm 
of a deep structure called compositional system, extensively studied in Lima [2] . 
A compositional system is determined from a series of guidelines that act directly 
on the construction of a musical vocabulary and syntax, that is, the building of 
materials and the relationships between them. These guidelines may be originally 
designed or modeled from another work, as it is the case in this study. In the 
latter case, it works as a kind of abstract intertextuality, in contrast with the 
kind of literal intertextuality, in which the surface levels of the intertexts reveal 
themselves more straightforwardly. These systemic guidelines (or definitions) can 
be expressed in a written language or translated into computer algorithms. 

The systemic modeling consists methodologically of three stages: parametric 1 
selection, analysis, and parametric generalization. In the first stage one selects, 
through a prospective analysis, the parameters that can render the best ana¬ 
lytical result. In the analytical phase, we describe the behavior of the selected 
parameters of the analyzed work, for example, the syntax of the harmonic struc¬ 
ture, the profile of melodic contours, the rhythmic patterns, etc. In the last stage, 
the values associated with these parameters obtained in the analysis are general¬ 
ized, that is, are emptied of particular values. Thus, for example, if the analysis 
reveals that the intrinsic structure of a work is built from the parsimonious con¬ 
nection of set classes [012], [013], and [014] one can generalize this information 
simply stating that this structure is consolidated through the parsimonious con¬ 
nection of pitch-class sets. The parametric generalization is the methodological 
key that allows us to envision a hypothetical compositional system related to a 
musical work. Such a system is hypothetical because it disregards the author’s 
intention, that is, it is not our interest to examine how the composer of the 
original work designed its structure, much less identify a compositional system 
suitable to encompass the entire set of works of a particular author. 

Once determined the compositional system, one can perform the reverse path, 
which consists of assigning particular values to the generalized parameters. This 
phase is called compositional planning and is in a way opposite to the analysis. 
It is through the compositional planning that a composer assigns values to the 


1 It is important to mention that we are considering here an expansion of the concept 
of parameter: instead of being associated to surface level elements, which are closely 
related to a specific aesthetic profile, a parameter can be as abstract as an inversional 
axis, for example, which disregards tonal or atonal biases. 
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parameters described in the compositional system and also takes free composi¬ 
tional decisions about undeclared parameters. Thus, one arrives at a structure 
that has, under certain perspectives, the same systemic lineage of the original 
work but it is still in the raw state. In a final stage, this raw structure is refined 
in order to set the work in line with certain aesthetic inclination. 


(367) [014] 



(AB2) [014] (458) [014] 


Fig. 1 . Initial gestures of Webern’s Konzert, Op.24- 


In order to clarify the methodological steps of systemic modeling, we will take 
the initial gestures of Webern’s Konzert, Op.24-, shown in Fig. 1. As one can see, 
it consists of four three-note fragments for Oboe, Flute, Clarinet, and Trumpet. 
Each fragment has its own information on the following parameters: pitch (in 
terms of pitch-class and register), rhythm (in terms of duration and time-point), 
dynamics, articulation, timbre, and tempo. In the first phase of systemic mod¬ 
eling, parametric selection, we will select the parameters that will be the focus 
of the procedure. As we have previously mentioned, we are not interested in an 
exhaustive modeling of the piece, since our purpose is only to capture some of 
its deep features, from which we will plan and create a new original work. In this 
short example, we will only select the pitch parameter. The first consequence of 
this selection is that we loose all the information regarding the other parame¬ 
ters. Examining Fig. 2, we can verify that the pitches of Webern’s excerpt were 
transferred to a musical staff with no other additional parametric information. 
The numbers inside parenthesis indicate the normal form and the numbers inside 
brackets indicate the prime form for each fragment 2 . 

The second phase of systemic modeling—analysis—will reveal to us the rela¬ 
tionships amongst the three-note fragments. It consists of three pitch-class sets 
related by transposition and inversion, which means that they belong to the same 
set class. Furthermore, they constitute a twelve-tone series, i.e., we have twelve 
distinct pitch classes forming a series derived from trichordal class [014]. At this 

2 Numbers 10 and 11 are represented by their hexadecimal equivalent, A and B, to 
avoid ambiguity. 
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Fig. 2. The methodological cycle of systemic modeling applied to the pitch parameter 
in the first three-note fragments of Webern’s Konzert, Op.24- 


point we still have objects (pitch-class specification for the fragments) and rela¬ 
tionships. The last phase of systemic modeling—parametric generalization—will 
consider only relationships between potential objects. This phase is fundamental 
to the modeling process, as we have laid down here, since it makes it possible to 
replace the original objects with entirely different new ones following the same 
relationships. This last phase is also important in the methodology because it 
allows us to define the compositional system. In this case, the system can be 
defined by the only rule: “Choose a trichordal class and build a derived series”. 

The compositional system can be used to plan a new work. In the compo¬ 
sitional planning, we will execute the reverse process: we build a series derived 
from a single trichordal class and complete the parametric information that was 
removed as a result of the systemic modeling. If we choose, for example, the 
trichord (047), which is a member of trichordal class [037], we can apply trans¬ 
positional and inversional operations in a such way that it will yield a derived 
series. One possibility for such a series would be: 0-4-7-9-5-2-3-8-B-A-6-1. This 
series, the only mandatory connection with Webern’s fragment through the rule 
expressed in the compositional system, can be musically realized as shown in 
Fig. 3, with the other parameters (rhythm, timbre, dynamics, articulations, and 
tempo) freely chosen by the composer. 
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Bassoon 


J = 112 



P 


mf 


PP 


Fig. 3. New fragment created from the compositional system of the first three-note 
fragments of Webern’s Konzert, Op.2f. 


The methodology of system modeling developed through a convergence of 
the theory of compositional systems and the theory of intertextuality. The the¬ 
ory of compositional systems loosely derives from Bertalanffy’s theory of general 
systems. For Bertalanffy a system consists of “sets of elements standing in inter¬ 
action” [3]. Music and language belong to the category of symbolic systems, 
which are formally defined by the “rules of the game”. Klir [4] , in turn, defines 
a system (S) as a set of objects (O) and relations (R), or formally S = (0,R). 
Meadows [5] enhances Klir’s definition with the introduction of the functional¬ 
ity factor. Lima [2], inspired by those authors, proposes that: “a compositional 
system is a set of guidelines to form a coherent whole that coordinates the use 
of musical parameters and their interconnection, in order to produce musical 
works”. We suggest an update to this definition by adding the terms “and musi¬ 
cal materials” right after “musical parameters”. This is particularly important 
in the cases when the materials are used in their entirety, i.e., without the para¬ 
metric fragmentation, such as in the works that combine intertextual materials 
in a literal fashion (Berio’s Sinfonia or Rochberg’s Music for the Magic Theater , 
for example). 

The theory of intertextuality is another vital piece in the definition of sys¬ 
temic modeling. Kristeva [6] states, “all text is constructed as a mosaic of quota¬ 
tions, every text is absorption and transformation of another text”. Kristeva [7] 
also highlights the relations between language and music, consequently bringing 
the intertextual thinking to the domain of music composition, a resource already 
employed in the past, as clearly demonstrated by Korsin [8] and Klein [9]. The 
methodology of system modeling employs intertextuality in a more abstract 
manner. As Lima [2] observed, the theoretical and artistic references in this field 
reveal that “the production of new texts can be obtained both through the lit¬ 
eral use of intertext as through modified version of them”. This latter can be 
called abstract intertextuality and parametric intertextualization, when applied 
to a set of specific musical parameters. The effectiveness and functioning of the 
methodology of systemic modeling can already be observed in the studies of 
xxxxxx and his graduate and undergraduate students [10-16]. 3 


3 Peer-reviewed papers blindly evaluated by researchers in the fields of composition 
and theory. All these papers contain at least one piece created with the systemic 
modeling of another piece. Some of the new pieces were already premiered. 
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3 Systemic Modeling of Guarnieri’s Ponteio No.l 

Figure 4 shows the first six bars of Guarnieri’s Ponteio No.l , which has a slow 
tempo, sorrowful, with 32 bars. The macrostructure is a loose ABA+coda. In 
our analytical methodology we consider that the work can be divided into two 
layers. The first layer, corresponding to the right hand of the piano, consists 
of seven melodic gestures (the last being a slightly varied recapitulation of the 
first). The second layer, corresponding to the left hand of the piano, is divided 
into two sub-layers: (1) a rhythmic figuration that is repeated throughout the 
entire work and (2) long notes in a lower register. As the rhythmic feature is not 
being considered in this analysis, the rhythmic figurations are compressed such 
that the second layer will be seen as a single block of four voices. 





Fig. 4. Six first bars of Guarnieri’s Ponteio No.l. Copyright by Universal Music Pub¬ 
lishing Group. Printed with permission. 


The modeling will be accomplished separately for each layer. For the top 
layer, we were inspired, to some extent, by the theory of developing variation, 
especially as proposed by Almada [17], since we seek to identify a generator set 
for all the work’s melodic gestures, which are obtained from a series of operations 
applied to this hypothetical generator set. The analytical methodology for the 
second layer, also inspired by the developing variation, consists in describing 
the parsimonious relations between the harmonic structures and the subsequent 
proposition of two generating structures: a chord and an interval set. 

Figure 5 shows an analysis for the entire lower layer indicating the chromatic 
sets in the order in which they appear (and not in normal form), and the intervals 
separating each element of these sets in relation to the adjacent sets. We have 
noticed that these intervals configure parsimonious movements, here defined as 
intervals of major second at the most. These intervals of parsimonious movements 
form sets indicated in Fig. 2 inside brackets and labelled with letters of the 
alphabet. It is noteworthy that these sets of intervals are subsets of arrangements 
with repetitions of five elements, taken four at a time ( AR 5 , 4 ) 4 , resulting in a 

4 The formula for the calculation of arrangements with repetitions is: A n , p = np [18]. 
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total of 625 possibilities. Figure 6 displays the relationships between all sets of 
intervals and the first set, which will be considered here as the generator set: 
[+ 2 , + 2 , + 1 , + 2 ], 



Fig. 5. Modeling of the upper layer of Guarnieri’s Ponteio No.l. 


In Fig. 6, the first column (left to right) indicates the position of the parsi¬ 
monious movements intervals within the universe of the 625 above-mentioned 
possibilities; the second column shows the number of intervals; the third, the 
movement type according to the analytical labels shown in Fig. 2; the fourth, 
the type of operation that relates this set of intervals with the generator set (the 
first one) 5 ; and the last column indicates the number of times that the set of 
intervals appears in the analysis. To complete the modeling of the lower layer, 

5 INV(C ), Inversion: inverts the sign of each element of C; RET(C ), Retrograda- 
tion: realizes the retrogradation in C\ ROT(C,n ), Rotation: rotates the set Cn 
times; SUBROT(C,n ), Subrotation: rotates the last three elements of C,n times; 
COMP{C , n), Compression: subtracts n from each element of C; MULT(C , n), Mul¬ 
tiplication: multiplies n to each element of C\ and SOMA(C : D ), Concatenation: 
concatenates the sets C and D. 
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Arr. 

Sets of intervals 

Type 

Operation 

Qty- 1 

620 

['+2', '+2', ’+1', '+2'] 

A 


3 

6 

['-2', '- 2 ', '-1', '-2'] 

B 

INV(A) 

3 

45 

['-2', '-1', '+1', '+2'] 

C 

SOMA((SOMA((MULT((SOMA(ROT(COMP (A,2),l),ROT(COMP 
(A,2),2))),3)), ROT(COMP (A,2),2))),A) 

3 

581 

['+2', '+1', '-1', '-2'] 

D 

INV(C) 

2 

44 

['-2', '-1', '+1', '-1'] 

E 

SOMA(SOMA(SOMA(MULT(ROT(COMP(A,2),3),3),B),C),A) 

1 

599 

['+2', '+1', '+ 2 ', '+1'] 

F 

SOMA((ROT(A,l)), (ROT(COMP(A,2),3))) 

1 

314 

['O', 'O', 'O', '+1'] 

G 

INV(ROT(COMP(A,2),3)) 

1 

604 

['+2', '+2', '-2', '+1'] 

H 

SOMA(ROT(A,3),MULT(COMP(A,2),4)) 

1 

306 

[’O', 'O’, '-1', '-2'] 

1 

SOMA(INV(A),MULT(SOMA(ROT(INV(COMP(A,2)),l),ROT(INV(CO 

MP(A,2)),2)),2) 

1 

276 

[’O', '-1', ’-2', '-2'] 

J 

SOMA(ROT(INV(A),l),MULT(ROT(INV(COMP(A,2)),2),2)) 

1 

27 

['-2', '-1', '-2', '-1'] 

K 

INV(F) 

1 

624 

['+2', '+2', '+2', '+1'] 

L 

ROT(A,3) 

1 

600 

['+2', '+1', '+2', '+2'] 

M 

ROT(A,l) 

1 

338 

[’O', '+1', 'O', '0'] 

N 

ROT(INV(COMP(A,2)),l) 

1 

595 

['+2', '+1', '+1', '+2'] 

O 

SUBROT(F,2) 

1 

38 

['-2', '-1', 'O', '0'] 

P 

RET (1) 

1 

163 

['-1', '-1', 'O', '0'] 

a 

SOMA(ROT(COMP(A,2),l),ROT(COMP(A,2),2)) 

3 

158 

['-r, '-1', '-1', '0'] 

R 

ROT(INV(COMP(A,l)),3) 

1 

463 

['+1', '+1', 'O', '0'] 

S 

INV(Q) 

1 

468 

['+!’, >+i' f '+!■, '0'] 

T 

INV(R) 

1 

302 

[’O', 'O', '-2', '-1'] 

U 

ROT(RET(l),2) 

1 

592 

['+2', '+1', '+1', '-1'] 

V 

S0MA(0,MULT(R0T(C0MP(A,2),3),3)) 

1 

32 

['-2', '-1', '-1', '-1'] 

w 

SOMA(ROT(R,3),MULT(ROT(COMP(A,2),2),2)) 

1 

169 

[,_!', ,_ V ' , +1 . . +1 .] 

X 

SOMA(RET(S),Q) 

2 

607 

['+2', '+2', '-1', '-1'] 

Y 

SOMA(SOMA(MULT(ROT(COMP(A,2),3),2),U),A) 

1 

588 

['+2', '+1', 'O', '0'] 

z 

INV(P) 

1 


Fig. 6. Operations that relate all the sets of intervals of parsimonious movements with 
the generator set (A), in the lower layer of Guarnieri’s Ponteio No.l. 


these operations become part of a script written in Python, which will allow us 
to propose several generative values for both the initial chord and the initial set 
of intervals. This will enable us to generate the entire set of chords, which will 
be available in PDF format via Lilypond 6 . 

We discuss now the modeling of the upper layer. The first melodic gesture 
shown in Fig. 7 (upper part) can be segmented into two trichords [013], whose 
normal forms (B02) and (9B0) are mutually related by Tn/, where 11 is the first 
pitch-class of the first normal form. The juxtaposition of these two trichords 
defines the entire contents of the first gesture, which consists of a tetrachord 
[0235] in the normal form (9B02). This tetrachord, in turn, has two tricordal 
subsets: the generator trichord [013] and the trichord [025], which is used in the 
third and fourth gestures. Generalizing, we can say that the juxtaposition of the 
generator set C X: o, consisting of pitch-class set {ci, C 2 , ...c n } 7 , with one of its 
transposed inversions—except for operations that would result in the generator 


6 Open-source application for editing music scores, available in http://www.lilypond. 
org/, visited in 02.22.2015. 

7 The generator set is identified as C x , o, in which x — 3,4, 5, 6, i.e., the set can be a 
trichord, a tetrachord, a pentachord, or a hexachord. The first value (x) indicates 
the set’s cardinality (how many elements has the set) and the second value (0) is 
simply a label to differentiate the set from the other sets used in the system. 
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set itself 8 —produces a set with greater cardinality 9 , which is the material of the 
first melodic gesture of the work. Formally we have that this set is C x 44.1 = 
C X: 0 + +TiI(C Xj 0 ), and i = c\ E C Xj 0 , as long as C x $ 7 ^ TiI(C x ; o). Additionally, 
the derived set (C^+pi), generates a series of subsets of cardinality x. One is the 
generator set. The other set, is used in the third and fourth gestures. 




Fig. 7. Segmentation of the first and the second gestures of Guarnieri’s Ponteio No.l. 


The second melodic gesture, shown in Fig. 7 (lower part), is formed by the 
juxtaposition of two tetrachords [0134], whose normal forms (89B0) and (1245) 
are related by X 5 , in which 5 is the last pitch-class of the second normal form, 
(1245). This tetrachord, labeled in Fig. 7 as C 4 ^, i.e., the second tetrachord 
detected in the segmentation, is a superset of the generator trichord C 3 ? o = 
[013]. A fundamental feature of this tetrachord is that the prime form of its 
first three pitch-classes equals the prime form of its three last pitch-classes. One 
can generalize the constructional principle of this second gesture considering the 
principle of formation of its pitch-class sets. Thus, we consider that the generator 
set is C x $ and has pitch-classes {ci,C 2 , ...c n }, and the derived set is C x+ 1 ? 2 and 
has pitch-classes {ci, C 2 , ...c n +i} in such a way that [ci, C 2 , ...c n ] = [ 02 , C3, ...c n+ i]. 
The derived set (C x + 1 , 2 ) appears in two normal forms related by T yi in which y 
is the last pitch-class of the second normal form. 

The third gesture is formed by the melodic trichord [025], which is the prime 
form of one of the subsets of the tetrachord of the first gesture, used in the 
normal form (790) and framed by two generator trichords, whose normal forms 

8 For example, 74/(024) = (024). 

9 This cardinality depends on the number of common pitch-classes between the gen¬ 
erator set and its transposed inversion. In the first gesture of Guarnieri’s Ponteio 
No.l the concatenation of two trichords produced a tetrachord because there are 
two common pitch-classes (0 and 11). If there is no common pitch-classes the result 
is a hexachord, as we see in the compositional planning of the new work, in which 
the generator trichord (45A) will produce the hexachord (456AB0) from the same 
operations used in Guarnieri’s first gesture. 
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(79A) and (457) are correlated by X 2 /, where 2 is the second pitch-class of the 
prime form [025]. In a generalized way, the chromatic set of the third gesture is 
formed by C x $ + C Xj 1 + C X: q. C x $ is manifested in two normal forms related 
to each other by TzI where z is the second pitch-class of the prime form [C Xi i\. 
This gesture is shown in Fig. 8 . 



Gesture 4 


C 4>3 = [0123] 


(790) , [025] (479) , [025] (B012) , [0123] 




Fig. 8. Segmentation of the third and the fourth gestures of Guarnieri’s Ponteio No.l. 


The fourth melodic gesture (Fig. 8 ) is the juxtaposition of two trichords [025] 
juxtaposed to a tetrachord [0123]. The normal forms of the trichords, (790) and 
(479), relate to each other by T 4 /, in which 4 is the first pitch-class of the sec¬ 
ond normal form. The first pitch-class of tetrachord [0123] ’s normal form, that 
is, (B012) is the sum of the first pitch-classes of the trichords (790) and (479), 
i.e., 7 + 4 = B. The procedure for the derivation of the tetrachord consists of 
the chromatic completion of the generator trichord by filling the spaces between 
pitch-classes. Thus, [013] becomes a chromatic tetrachord through the insertion 
of pitch-class 2 between 1 and 3. The chromatic completion in generalized sit¬ 
uations, namely, in which the generator set will be chosen by the composer, 
can produce a pitch-class set with cardinality higher than 4, as in the case of 
the trichord [048], which in order to achieve chromatic completeness need to be 
transformed into the nonachord [012345678]. For the generalization of the fourth 
gesture’s material we observe that it is formed by the juxtaposition of two sets 
C Xj 1 whose normal forms are related by T W I, in which w is the first pitch-class of 
the second normal form. To these two sets it is juxtaposed a third set consisting 
of the chromatic completion of the generator set C Xr o- The first pitch-class of 
this set’s normal form is obtained by adding the first pitch-classes of the normal 
forms of C Xy 

The chromatic materials for the fifth and sixth melodic gestures shown in 
Fig. 9 are also derived from the generator set. The material for the fifth gesture 
is the trichord [014] obtained by the unitary increment of the last pitch-class 
of the generator trichord [013], and the sixth gesture’s material consists of the 
trichord [015], which is achieved by the unitary increment of the last pitch- 
class of the trichord corresponding to the fifth gesture. The normal forms (014) 
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and (015) coincide with the prime forms for both gestures. Generalizing, one 
can derive the fifth gesture by the unitary increment of the last pitch-class of 
the generator set and the sixth gesture by the unitary increment of the last 
pitch-class of the fifth gesture. 


[013] 

Gesture 5 <> 

C 3>2 = [014] 



Gesture 6 



C 3>3 = [015] 


Fig. 9. Derivation of the trichords of the fifth and sixth melodic gestures of Guarnieri’s 
Ponteio No.l , from the generator trichord [013]. 


The model for the upper layer was automated by a MATLAB function 
( ponteiol.m ) that contains all the relationships described in the modeling. The 
composer inserts an initial set and the ponteiol.m function provides all the other 
sets. Thus, for example, if we want to replicate the same material of the top layer 
of Guarnieri’s Ponteio No.l, we must enter the set {1120}. 

4 Compositional Planning of Germinacion 

The compositional planning of the new work for woodwind trio (oboe, clar¬ 
inet, and bassoon), entitled Germinacion 10 , started with the generation of the 
set of chords for the bottom layer, through the insertion of a generator chord 



Fig. 10. First eleven chords generated by the Python script, based on the systemic 
modeling of Guarnieri’s Ponteio N.l 

10 This is the first movement of a piece entitled Vientos Tejanos , Op. 203 (2016), ded¬ 
icated to the Vientos Tejanos Trio, from Texas (USA). The other two movements— 
Tejido and Siluetas —were also composed with the methodology of systemic model¬ 
ing. 
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Generator set 


(45A) 

Prime form [016] 

Gesture 1 

Rl 1 

(45A) 

(45A) 


Rl 2 

(6B0) 

(6B0) 


gl_3 

(456ABO) 

Hexachord generated by the juxtaposition of gl_l e gl_2. This 
hexachord will produce the pentachordal subset [01268], 
which will be used as the central material of the third gesture. 

Gesture 2 

g2_l 

(56B0) 

(56B0) e (1278) related to each other by T 8 I, in which 8 is the 
last pitch class (1278) 


g2 2 

(1278) 


Gesture 3 

g3_l 

(238) 

(238) e (5AB) relate to each other by TJ, in which 1 is the 
second pitch-class of [01268] 


g3_2 

(78913) 

The normal form of this pentachord resulted from the addition 
of its prime form [01268] by 7, the same value used in 
Guamieri’s Ponteio No.l(see endnote 12). 


g3 3 

(5AB) 


Gesture 4 

g4_l 

(AB046) 

This pentachord is obtained applying T 7 I to the second 
pentachord. The value 7 is the first pitch-class of the second 
pentachord (g3 2). 


g4_2 

(78913) 

This pentachord is generated by applying T 9 I to the 
pentachord’s prime form [01268]. This relationship is the same 
one found in Guamieri’s Ponteio N.l (see endnote 13). 


g4_3 

(56789AB) 

This heptachord is generated by the chromatic completion of 
the generator trichord [45A], which is transposed in order to 
start with pitch-class 5 (the addition of the first pitch-classes of 
g4 1 and g4 2). 

Gesture 5 

g5 

(017) 

Parsimonious ascending movement in the last pitch-class of the 
prime form of the generator set. 

Gesture 6 

g6 

(018) 

Parsimonious ascending movement in the last pitch-class of the 
prime form of g5. 


Fig. 11. Gestures generated by MATLAB’s function — ponteiol.m , based on the sys¬ 
temic modeling of Guarnieri’s Ponteio No.l. 


and a set of parsimonious intervals in the Python script. We chose the half- 
diminished chord (7A15) and the interval set [+2, +1, +1, — 1], resulting in the 
entire chordal structure, of which the first eleven ones are shown in Fig. 10. In 
turn, for the generation of the melodic gestures we started from trichord (45A), 
which inserted into the ponteiol.m function generated all the material of the 
six gestures, shown in Fig. 11. From these data, generated from the systemic 
modeling of Guarnieri’s Ponteio No.l , we began the composition of the new 
work, freely dealing with other parameters (rhythm, dynamics, articulation) and 
the macrostructure. However, the rhythmic profile present in the bottom layer 
throughout Guarnieri’s Ponteio No.l , as well as the manner of connection of 
some notes by non-harmonic tones in the upper layer (see Fig. 4) 11 , sometimes 
appear in the new work. This is an attempt to relate the two works in the super¬ 
ficial level, reinforcing the already existing deep systemic connection built by the 
modeling. Figure 12 shows the first page of Germinacion. 


See in Fig. 4 the passing G\) connecting the G of measure 2 to the F of measure 
3 (if one considers the structural harmony to be formed by the chord A-C-G-B), 
and the D§ between G and E in the fourth measure (considering the harmony to be 
A-C-E-G). 


li 
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J =90 




Fig. 12. First page of the new work, entitled Germination (first movement of Vientos 
Tejanos ), based on the systemic modeling of Guarnieri’s Ponteio No.l. 
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Abstract. This paper presents a corpus study that identifies the num¬ 
ber of statistically distinct modes used in sacred and secular genres from 
1400-1750. Corpora used for the study include Masses, motets, and secular 
songs from the Franco-Flemish School, works by Palestrina, secular Italian 
songs with alfabeto guitar tablature from the early seventeenth century, 
and works by J.S. Bach. A k- means cluster analyses of key profiles deter¬ 
mine the number of distinguishable modes in each corpus. The results of 
this study show that the number of modes present in a corpus depend not 
only on date of publication but also on the genre of a composition. 


Keywords: Music computation • Machine learning • Early music 
Music genre • Cluster analysis • K-means 


1 Introduction 

It is often assumed that European music before common-practice tonality was 
built on a system of six, eight, or twelve modes and that music from the eigh¬ 
teenth century and onwards was built on two: major and minor. Historical nota¬ 
tion supports this with the number of signatures and final cadences possible in 
any given system. However, the results of this study show that the modal frame¬ 
work of a musical corpus depends on its genre and not just its era of publication. 
iF-means cluster analyses of key profiles to determine the number of statistically 
distinct modes in given corpora of music. 

1.1 Historical Notation 

Generally, music prior to the mid-seventeenth century was notated in a sys¬ 
tem that indicates several modes. Music from this period was often notated by 
two signatures—no flats ( durus ) and B flat ( mollis )—with several possible final 
cadences for each signature, which can be seen in Fig. la. By the eighteenth cen¬ 
tury a system of several key signatures that were each associated with a major 
and minor key was well established, which can be seen in Heinichen’s musical 
circle from 1711, reproduced in Fig. lb. The durus/mollis system indicates a 
multi-modal 1 framework, as seen in Table la, while the later system implies a 

1 This paper uses the Greek modal names common today but with the understanding 
that these names were often not historically used. 
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two-mode framework. However, it is possible that the notated modes could clus¬ 
ter into only two based on the key profiles of their representative musical scores, 
as seen in Table lb. 


Signature 

Final Cadence 

t| (durus) 

C, D, E, F, G, A 

b ( mollis ) 

C, D, F, G, A Bt> 


(a) Mode possibilities in the 
durus/mollis system 


MuficalifdKT Circul, 



(b) Heinichen’s musical circle (1711) in the common- 
practice system 

Fig. 1 . Comparison of modes and keys from the seventeenth and eighteenth centuries 


Table 1. Two possible modal frameworks in early music 


Mode 

SignatureiFinal 

Ionian 

t|:C, b:F 

Dorian 

t|:D, b:G 

Phrygian 

t|:E, t>:A 

Lydian 

t):F, b:Bb 

Mixolydian 

t]:G, b:C 

Aeolian 

t):A, b:D 


(a) Notated modes 


Mode 

SignatureiFinal 

Major 

t|:C, ll:F, t|:G, 
b:F, b:Bb, b:C 

Minor 

t|:D, t|:E, t|:A, 
b:D, b:G, b:A 


(b) Possible coalesced 
modes due to accidentals 
and/or musica ficta 
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1.2 Previous Approaches 

Key profiles have been used by Temperley and Marvin [18,19] and are based 
on cognitive studies by Krumhansl and Kessler [14]. A key profile is a twelve- 
dimensional vector—representing pitch-classes 0-12. The value of each dimension 
is the percent the associated pitch class is used in a given selection of music. 
Temperley and Marvin created major and minor key profiles based on a corpus 
study of the string quartets by Mozart and Haydn. The key of a selection of 
music can be compared to the major and minor key profiles to determine the 
key and mode of the given selection. 

Albrecht and Huron published a study modality and key profiles in fifty- 
year epochs between 1400 and 1750 [1,2]. Each epoch contained fifty scores 
from representative composers. They created key profiles from the first and final 
ten measures of each piece. They used Ward’s method, which is a hierarchical 
clustering algorithm that shows modes and sub modes. They found that music 
prior to 1700 featured some sub-clusters while music after 1700 clustered into 
only two distinct modes. 

2 Methodology 

This study expands that of Albrecht and Huron with a much larger corpus and 
uses a different methodology. This study analyzes entire pieces rather than the 
first and last measures. 2 A-means clustering is used to find the number of modes 
that best represents a given corpus rather than finding a hierarchy. 

This study was encoded in Python. Musical scores were parsed through 
music21 [5], and the machine learning algorithms were based on the sci-kit learn 
module [15]. 


2.1 Corpora: Chronology, and Genres 

To compare genres and chronology, the corpus is divided by genre and time 
period, which can be seen in Table 2. All corpora except for Bach were notated 
in the durus/mollis system. The Bach and Palestrina corpora were included with 
the music21 Python module [5], and the Franco-Flemish corpora were taken from 
the Josquin Research Project [20]. 

The alfabeto corpus is a collection of 529 secular Italian songs from the 
early seventeenth century that contain alfabeto tablature—letters that indicate 
specific hand shapes of chords for a Baroque guitarist to strum [3]. These chords 
all form either major or minor triads. 3 These letters, each paired with a bass note, 
provide a kind of continuos realization. The alfabeto chords and their associated 
bass notes were encoded by the author for this study. 

2 This method still provides clear and high-scoring clustering. However, Albrecht and 
Huron’s method is necessary when extracting key profiles from later music where 
large sections may be in different keys. 

3 Christensen has explored the theoretical significance of this unique triadic 
practice [4]. 
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Table 2. Corpora used for this project 


Corpus 

Source 

Dates 

Genre 

Corpus size 

Franco-Flemish 

Josquin research 

C.1420-C.1520 

Motet 

175 


project 


Mass 

394 




Secular 

151 

Palestrina 

music 21 

16 th c. 

Sacred 

903 

Alfabeto songs 

Original 

1610-1651 

Secular 

529 

J.S. Bach 

music 21 

18 th c. 

Mixture 

352 


2.2 lf-means Clustering 

To find the modal clustering of a corpus, a key profile for each musical score 
is created with the first dimension being the final bass note. The placement of 
other pitches are based on their distance in semitones (pitch-class interval) from 
the bass note. Each song is also labeled with its key signature and final bass 
note. The labels will be used to compare the key profile cluster results with the 
notated mode. 

To cluster songs of a corpus, the A;-means clustering algorithm is used, which 
is an unsupervised machine learning algorithm that clusters n points of data into 
k number of clusters. Key profiles are measured using the Euclidean distance. 
Songs that have similar key profiles will have small Euclidean distances. Given 
a large set of data, songs with similar key profiles should cluster together in a 
12-dimensional space. 

Given the Euclidean distances of the songs within a corpus, the £;-means 
algorithm attempts to partition the data, a set of songs (xi,^ 2 ,..., x n ), into a 
number of clusters, k sets where S = {Si, S 2 ,..., S&}. The success of clustering 
is measured by finding the inertia (within-cluster sum-of-squares), which the 
algorithm tries to minimize [15]: 

i= 1 xESi 

where yiq is the mean of points in Si. After initiation, the partitions are adjusted 
and scored again. This process continues until the inertia reaches convergence, 
or a desired minimum. 

To determine the number of modes (k) that most accurately represents a 
corpus, k 2-12 are tested. Two scoring metrics are used to determine the dis¬ 
tinctiveness of the clustering and the degree to which the notated mode agrees 
with the clustered mode. 

The silhouette coefficient, as defined by Rousseeuw [17], measures the dis¬ 
tinctiveness of the clusters without considering the data labels and is based on 
two metrics: a(i), the mean distance between a sample (i) and all other points 
in the same class (a), and 6(i), the mean distance between a sample (i) and all 
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other points in the next nearest cluster (b(i)) [15]. Thus the silhouette score, 
where —1 < s(i) < 1, can be defined as follows: 

= b(i) - a(i) 

U max{a(i),b(i)Y 

The model with the highest silhouette score will be selected as the number of 
modes that best represents the corpus. 

The completeness score (c), as defined by Rosenberg and Hirschberg [16, 411-2], 
measures the degree to which the notated modes (data labels) agree with the clus¬ 
tered modes by determining the success of notated modes belonging to the same 
clusters. A corpus is comprised of N data points, which is comprised of a set of 
classes, or labels, (C) and a set of clusters (K). The classes are the labeled key 
signatures, and the clusters are the number of modes. Rosenberg and Hirschberg 
define A as a contingency table produced by the clustering algorithm where A = aij 
and where is the number of data points that belong to class q and cluster kj . 

The result, where 0 < c < 1, can be defined as follows: 

Hmc) 

H(K) 


where H(K\C) is the conditional entropy of the cluster assignments given the 
class: 


H(K\C ) 


|C| \K\ 

£££**■ 




C— 1 k= 1 


/Zk=l a ck 


and where 


\ k \ ^vic'i 

h(k) = - y ] — 

k =1 


n 


log 


EUl a ck 


The model with the highest completeness score will show which number of 
modes best agrees with the key signature notation, if any. 

To determine the mode that represents each cluster, the coordinates of each 
cluster’s centroid is found. The seven dimensions with the highest values are 
extracted, which infers the mode. For example, Ionian mode (Major) would 
score highest in dimensions 1, 3, 5, 6, 8, 10, and 12 (scale degrees 1, 2, 3, 4, 5, 
6, 7), while Mixolydian mode would score highest in dimensions 1, 3, 5, 6, 8, 10, 
and 11 (scale degrees 1, 2, 3, 4, 5, 6, t>7). 


2.3 Visualizing Results 

Silhouette and completeness scores of the uncompressed (12-dimensional) data 
will be shown for 2-12 clusters. However, to visualize the clustering, princi¬ 
pal component analysis (PCA) is used, which decompresses the number of 
dimensions from twelve to two. The decompressed data can be plotted on a 
2-dimensional graph. Each data point on a graph is the notated mode for a song 
in the corpus. The distances between data points approximate the proportional 
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distance of the Euclidean distance of key profiles. Cluster centroids are num¬ 
bered and later labeled with their modes. The partitions are visualized in as 
Voronoi diagrams where cluster membership is shown by color partitions in the 
background. 

To determine the accuracy of the PCA-reduced graphs, the reduced data is 
processed through £;-means clustering and scored using the silhouette coefficient 
and completeness score. If the compressed scores are similar to the uncompressed 
scores, the PCA-reduced graphs are an appropriate representation of the uncom¬ 
pressed data. 

3 Results 

The results of each corpus will be presented in reverse chronological order. There 
will be a line graph that shows the silhouette and completeness scores for 2-12 
clusters and a PCA-reduced graph of the highest-scoring number of modes. The 
J.S. Bach corpus is the only corpus not notated in the durus/mollis system, 
and each song on the PCA graph is labeled with its key. All other corpora were 
notated in the durus/mollis system and are labeled with their signature and 
final cadence (ex. b:G for G mollis or “G dorian”). 

3.1 J.S. Bach 

The songs from the J.S. Bach corpus show strong clustering for only two clusters, 
which can be seen in Fig. 2. The completeness score is a perfect 1, indicating 
that all notated modes belong to the same clusters, and the silhouette score 
is also quite high. Both the completeness and silhouette scores drop off signifi¬ 
cantly after two clusters. It is also important to note that the clusters are mostly 
compact, which indicates strong clustering. The minor cluster is somewhat more 



(a) A-means graph: distance is propor- (b) Silhouette and completeness scores of 
tional to difference the Bach corpus 


Fig. 2. A-means clustering of the Bach corpus 
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spread out, which is perhaps to be expected due to the flexibility of scale degrees 
6 and 7. The cluster centroids are as follows: 

1. Major (Ionian) 

2. Minor (Aeolian) 4 

3.2 Alfabeto 

Although the Alfabeto corpus was notated in the durus/mollis system, it clusters 
into two modes, which can be seen in Fig. 3. The silhouette and completeness 
scores, while not as high as in the Bach corpus, still show a strong peak at two 
clusters and quickly fall off. If the notated modes reflected the clustered modes, 
there should be a strong peak at six clusters. However, it is clear that a two-mode 
system is at work in the actual notes despite the music’s notation. The cluster 
centroids are as follows: 

1. Major (Ionian) 

2. Minor (Aeolian) 




number of clusters (modes) 


(a) A-means graph: distance is propor- (b) Silhouette and completeness scores of 
tional to difference the alfabeto corpus 


Fig. 3. A-means clustering of the alfabeto corpus 


3.3 Palestrina 

The Palestrina corpus, also notated in the durus/mollis system, clearly shows 
more than two clusters, which can be seen in Fig. 4. The silhouette and com¬ 
pleteness scores are highest for two through five modes, with two and five being 
the highest peaks. The PCA-reduced graph in Fig. 4a shows five clusters. If 
two clusters were selected, clusters 2 and 5 would be partitioned as one cluster 
(although the data points would still be in the same place), and the remaining 
clusters would be another cluster—albeit a sprawling one. The cluster centroids 
are as follows: 

4 This natural minor mode centroid differs from Temperley and Marvin’s Mozart and 
Haydn minor key profile, which is in harmonic minor. However, the raised seventh 
degree is only slightly higher than the lowered [18, 195]. 
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1. Minor (Aeolian) 

2. Major (Ionian) 

3. Dorian 

4. Phrygian 

5. Mixolydian 

The only notated mode that did not cluster to itself was Lydian. The songs 
notated in Lydian mode mostly fell into the Ionian cluster, which shows that 
the #4 was lowered more frequently than it was left raised. This is due in part to 
the notated accidentals and the moderate amount of editorial ficta in the digital 
corpus. 



(a) A-means graph: distance is propor- (b) Silhouette and completeness scores of 
tional to difference the Palestrina corpus 


Fig. 4. A-means clustering of the Palestrina corpus 


3.4 Franco-Flemish Genres 

The Franco-Flemish corpus from the Josquin Research Project can be divided 
into Mass movements, sacred motets, and secular songs. The corpora provide an 
opportunity to test whether the multi-mode system is a product of its time or 
whether genre may also play a role. 


Masses. The mass movements cluster well in two to six modes, which can be 
seen in Fig. 5. None of the clusters are as distinct as the Palestrina corpus, but 
the completeness score shows that the corpus can be divided into six modes, 
and the notated modes still mostly cluster together. The five cluster centroids 
in Fig. 5a are as follows: 

1. Dorian 

2. Major (Ionian) 

3. Phrygian 

4. Mixolydian 

5. Minor (Aeolian) 
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The only mode that does not become a distinct cluster is again the Lydian 
mode. If six clusters are chosen, the small cluster of songs between centroids 
3 and 5 become a cluster, but their centroid key profile and their notated mode 
both indicate Aeolian mode, like centroid 5. The songs notated in Lydian mode 
are mostly nested within cluster 2, Ionian. 



(a) K-means graph: distance is propor- (b) Silhouette and completeness scores of 
tional to difference the Franco-Flemish Masses corpus 


Fig. 5. K-means clustering of the Franco-Flemish masses corpus 


Motets. The results for the sacred motet corpus is quite similar to the Mass 
corpus, although there is a stronger peak at five modes in the completeness score, 
which can be seen in Fig. 6. The five cluster centroids in Fig. 5a are as follows: 

1. Mixolydian 

2. Phrygian 

3. Dorian 

4. Major (Ionian) 

5. Minor (Aeolian) 



(a) K-means graph: distance is propor- (b) Silhouette and completeness scores of 
tional to difference the Franco-Flemish Motet corpus 


Fig. 6. K-means clustering of the Franco-Flemish Motet corpus 
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Once again, the Lydian mode is not distinguishable, which shows that acci¬ 
dentals and/or ficta consistently lowers the jJ4. All songs notated in Lydian mode 
belong to cluster 4, Ionian. 


Secular Songs. The secular song corpus, however, clusters best in two modes 
with a slight secondary peak at four modes (Figs. Ta and 6b). The clusters are 
not as distinct as later corpora, but it is clear that two modes better represents 
this secular song corpus than any other number of modes. Like the Bach and 
alfabeto corpora, the centroids of the Franco-Flemish secular songs are again 
major and minor: 

1. Minor (Aeolian) 

2. Major (Ionian) 



(a) K-means graph: distance is propor- (b) Silhouette and completeness scores of 
tional to difference the Franco-Flemish secular songs corpus 


Fig. 7. K-means clustering of the Franco-Flemish secular songs corpus 


The differences between the secular songs and the sacred genres are quite 
significant, although the two sprawling clusters shown in Fig. 7a do show some 
modal drift. For example, the notated Phrygian songs are mostly at the bottom 
of cluster 1. Likewise, the notated Mixolydian songs are slightly to the left of 
the notated Ionian songs in cluster 2. However, it is quite possible that a larger 
data size would fill in those gaps. Despite the possible modal drift, the silhouette 
scores in Fig. 7b clearly shows that two modes best describes the secular song 
corpus. 


4 Discussion 

The results of this study show that different genres have different modal frame¬ 
works even if composed within the same time period, country, and even by the 
same composer. Secular genres cluster into only two modes despite the music’s 
notation long before sacred genres. This leaves open speculation of the modal 
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framework for vernacular genres that were not notated. Perhaps the alfabeto 
corpus is the closest we can come to vernacular music [4,7,12]. Further digitiza¬ 
tion of early music can extend this study, especially in the seventeenth century. 
Given enough music, comparisons could also be made geographically or perhaps 
by instrumentation. 

Of course, understanding the harmonic structures of a corpus includes more 
than pitch class frequency but also how those pitches are grouped 5 and move 
from one to another, 6 but this study is an important first step. It gives a view of 
the modal framework in early music that provides a foundation for other ways of 
investigating harmonic practice—a foundation that recognizes the different har¬ 
monic practices of secular and sacred genres and that sometimes compositional 
practices were not always in line with notational or theoretical conventions of 
their time. 
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Abstract. Cross entropy, a measurement of the complexity/predictability of a 
series of observations given a probabilistic model, has been used in a variety of 
domains in music scholarship for decades. This paper presents a novel appli¬ 
cation of this metric to musical corpus analysis. Given a series of divisions to a 
larger corpus, a sub-corpus is relatively “unique” if a probabilistic model 
derived from its pieces better predicts its constituent pieces than do models 
derived from other sub-corpora. A sub-corpus is relatively “coherent” if its own 
model describes its pieces better than a model derived from the entire corpus. 

The Yale-Classical-Archives corpus was used to illustrate several strategies for 
sub-corpus division, each of which are tested for uniqueness and coherence. 

Some broader interpretive applications are also described. 

Keywords: Computation • Corpus analysis • Cognitive modeling • Style 

1 Introduction 

Music researchers have been experimenting with concept of entropy almost since the 
field of informatics began in the mid 20 th century [1,2]. Since the 1950s, scholars have 
connected entropy, or the relative complexity of some signal, to musical style, com¬ 
munication, normativity, meaning, and compositional modeling [3-7]. As shown in 
Eq. 1, the entropy H of an observed series O measures the complexity of a signal by 
calculating the log-probability of an event o to occur within some observed series 
O (here, lo gP(p$), weighting that value by the relative frequency with which that 
observation occurs in O (here, P(Oj), and summing all such values. The negative sign 
turns the negative value resulting from the logarithm into a positive value, such that the 
higher the value, the more randomness - or more entropy - as series has. A very 
redundant signal - one in which a particular event happens most of the time - will have 
low entropy since those events are highly predictable given the rest of the signal, while 
a series of wildly unpredictable events would have a high entropy. (Here, the loga¬ 
rithm’s base can be chosen as appropriate for the situation: this study uses base 2 in 
order to report entropy in bits.) 



(i) 
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In the past several years, work by David Temperley [8] has introduced a particular 
modification of this technology to music: cross entropy. In this framing of the general 
concept, the probability of each event is judged by some other model rather than by 
some probabilistic distribution drawn from the observation itself. As shown in Eq. 2, 
this formula takes the negative log-probability of an event o given some probabilistic 
model m, again weighting each value by its probability mass within the observation 
series O, and then summing for all n events. This essentially captures how well some 
probability distribution m accounts for the series of observations O. 



( 2 ) 


This paper proposes several novel ways of applying this modeling technique to the 
analysis of musical data. I will show how cross entropy can capture the coherence and 
the uniqueness of musical corpora. Because, in one sense, using a composer’s identity 
to build a corpus creates an unassailably coherent and unique dataset: using this 
framework, the composer’s identity provides the desideratum as to whether a piece is 
included in some corpus. But, one might also wonder whether a composer writes pieces 
that are distinct from their contemporary colleagues, or if a composer’s style is basi¬ 
cally interchangeable with that of their contemporaries. If the former were true, the 
composer’s pieces would exhibit notably divergent statistical properties from those of 
their colleagues; but, if two composers have made virtually identical decisions sur¬ 
rounding some musical parameter, then the same statistical model could represent both 
corpora. 

This paper agues that cross entropy can shed light onto these sorts of questions by 
manipulating which models are used to assess the corpus. Given some corpus with 
potential smaller divisions (or, sub-corpora ), if the individual pieces within some 
sub-corpus are predicted by the overall corpus better than any other sub-corpus, that 
sub-corpus is unique as compared to other sub-corpora. If that sub-corpus contains 
pieces that are more statistically similar to one another than to the overall corpus, that 
sub-corpus is coherent. Below, I show a computational model that exploits these 
properties to test the coherences and uniqueness of several different divisions of a large 
corpus of Westem-European common-practice MIDI files. I end by discussing the 
interpretive potentials that this modeling provides. 

2 The Corpus, the Sub-corpora 

This experiment relied on data from the Yale-Classical-Archives Corpus [9]. This 
corpus collects MIDI files from classicalarchives.com (a website of user-sourced MIDI 
files), each associated with metadata that specifies the file’s opening key, meter, com¬ 
poser, date of composition, instrumentation, composer’s nationality, genre, and so on. 
Given that this study was interested in dividing corpora by composer, the 19 composers 
listed as “The Greats” on the website were used: Bach, Beethoven, Brahms, Byrd, 
Chopin, Debussy, Handel, Haydn, Liszt, Mendelssohn, Mozart, Saint-Saens, Scarlatti, 
Schubert, Schumann, Tchaikovsky, Telemann, Vivaldi, and Wagner. The overall corpus 
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was more than 5,000 pieces, and the average composer’s dataset contained 231 pieces, 
with the smallest corpus - Wagner’s - containing only 33, and the largest - Scarlatti’s - 
containing 554. The corpus is divided into “salami slices” - every vertically where the 
pitch-class content changes. The average composer’s sub-corpus had 339,185 such 
slices, with Wagner’s again being by far the smallest (67,538), and Mozart’s being the 
largest (1,322,716). The corpus also contains tonal annotations, which were used to 
convert the corpus’s pitch material of each slice into scale degrees. 

These scale-degree sets were used to create Markov (n-g ram) chains designed to 
probabilistically model how surface harmonies progressed to one another. Dilferent 
sizes of ft-grams within these tonal passages were then tallied, and after initial 
experimentation it was determined that trigrams (i.e., n- 2) seemed to balance 
between precise and sparse data. (An n-gmm model involves contiguous sequences of 
n items from a sequence of observations. When n- 2, the observation at the current 
timepoint is conditioned on the two previous observations. The model is therefore 
concerned with three-chord trigrams - the current and previous two chords - at every 
observed timepoint.) In order to remain as theory-neutral as possible, the meter 
metadata was used to gather trigrams at three metric levels; these three levels were then 
combined. Repeating data collection at several levels and agglomerating the resulting 
trigrams allows for patterns that recur at several durational or metric levels to become 
more dominant in a distribution while remaining agnostic as to the relative importance 
of dilferent surface divisions. The three metric levels were (1) the salami slices 
themselves, (2) the contents of each beat as defined by the corpus’s metric data (i.e., the 
quarter-note in 4/4), and (3) the contents of the beat’s primary division (i.e., the eighth 
note in 4/4; this division is also recorded by the corpus). NB: this process recognizes 
not only traditional chords (like triads and seventh chords) but also less traditional 
chords (like passing chords and dissonances): this study therefore assumes that any 
surface structure is a legitimate “chord,” following [10-12]. The tallying and organi¬ 
zation of the YCAC’s trigrams was implemented with Python version 2.6 using the 
music21 software package [13]. 

In order to compare the uniqueness and coherence of various dilferent divisions of 
the larger dataset, several dilferent divisions of the larger corpus were undertaken. Most 
basically, each individual composer’s output will first be considered a sub-corpus. 
Next, chronological divisions were used, grouping pieces in the corpus by their date of 
publication, first arranged by the half-century beginning in 1650 and ending in 1900, 
and then by 30-year epochs (now beginning in 1680 because of the sparse data between 
1650 and 1679). Finally, to introduce machine-learned groupings into the corpus, the 
groupings found in [14] were used. Here, the identical dataset and modeling as 
described above were used, and composers’ trigram frequencies were submitted to a 
k-means cluster analysis to group composers whose surface harmonic progressions 
were statistically similar. The study used values k- [0... 10]; peaks in silhouette widths 
were used to identify optimal k values; and, such peaks values were identified for 7 and 
10 clusters. The groupings - used here as sub-corpora are reproduced in Table 1. 
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Table 1 . Umeans clusters drawn from White (2014) 


K -means clusters 


k = l 

k= 10 

Bach 

Bach 

Byrd 

Byrd 

Beethoven, Mozart, Haydn, Schumann, Mendelssohn, 
Brahms, Schubert, Wagner 

Beethoven, Mozart, Haydn, 
Mendelssohn, Schubert 

Tchaikovsky, Liszt, Chopin, Saint-Saens 

Tchaikovsky, Liszt, Chopin, 
Saint-Saens 

Telemann, Vivaldi, Handel 

Telemann, Vivaldi 

Debussy 

Debussy 

Scarlatti 

Scarlatti 


Wagner 


Brahms, Schumann 


Handel 

3 Modeling Coherence and Uniqueness 


The coherence and uniqueness of each division was 

corpus quantified by determining 


the cross entropy of each piece given every other piece (exclusive of the piece under 
question) in some sub-corpus. In terms of Eq. 2, for each piece, the observations 
O would be those trigrams within an individual piece within the sub-corpus, and the 
model m would be the probability distribution of all trigrams within the remaining 
pieces in that corpus. As a baseline, each piece within the sub-corpus was judged in 
relation to the entire corpus (here, the entire YCAC becomes the model m). The 
average and standard error of these cross entropies across the sub-corpora is tallied, as 
well as the pieces in each sub-corpus as judged by the entire corpus. A sub-corpus is 
unique if its standard error is sufficiently low to not overlap with the window of any 
other corpus’s standard error (“does this sub-corpus predict its own pieces better than 
any other sub-corpora above chance?”). A sub-corpus is coherent if the standard error 
of its self-assessments is outside the standard error of the overall corpus (“does this 
sub-corpus predict itself better than it would be predicted by the entire corpus?”). 

NB: As cross entropy is itself a relative measurement, so too are uniqueness and 
coherence. Each of these numbers must only be judged in relation to other numbers: a 
piece is only unique in relation to other sub-corpora or only more coherent than the 
overall corpus. 

Importantly, both these ideas have conceptual overlaps with the central idea of 
“entropy.” When applied to a single dataset (i.e., using the format of Eq. 1), entropy 
rises when each event is more random in terms of the other events, and falls when each 
event it is more predictable. Uniqueness and Coherence manipulate these relationships 
by comparing a dataset’s randomness not simply to the dataset itself, but to other 
potential datasets with which the original dataset has some relationship. In other words, 
these ideas capitalize on the original informatic structure of entropy to draw out 
additional relationships between datasets. Note also that the difference between 
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uniqueness and coherence is not mathematical in nature (indeed, they are mathemati¬ 
cally identical), but in the relationships between the models used, with uniqueness 
quantifying relationships between sub-corpora and coherence quantifying relationships 
between a corpus and its sub-corpora. 

4 Sub-corpora 

4.1 Dividing by Composer 

By dividing the corpora by composer, 74% composer-by-composer comparisons were 
significantly unique. The median proportion of unique comparisons was 83%. 88% of 
these sub-corpora predicted themselves better than the overall corpus. Only two 
composers - Byrd and Handel - registered perfect results: the trials produced signif¬ 
icantly lower cross entropy when comparing these composers’ own pieces to their own 
corpora than when comparing them to any other composer’s corpus. In other words, 
these results show the models to be “sure” these composers’ pieces were significantly 
more likely on average to be composed by themselves than by someone else. Example 
la shows Handel’s sub-corpus compared to that of each other composer. The cross 
entropy of the composer’s pieces when compared to the other composers’ sub-corpora 
are shown as the clear bar, other composers are shown by solid bars, the self-wise 
comparison is shown by the white bar, and the dashed bar shows Handel’s average 
cross entropy judged by the entire corpus. Handel’s own pieces are judged statistically 
significantly better than they are judged by other corpora- the corpus is therefore 
unique. The corpus also judges itself better than it is judged by the overall corpus- it is 
therefore coherent (Fig. 1). 
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Fig. 1. Comparative cross entropies using Handel’s sub-corpus model 


However, more than a quarter of the time, these trials judged other corpora to 
predict a composer’s pieces with either a lower or insignificantly different level of cross 
entropy when compared to the composer’s own corpus. Mendelssohn’s corpus, for 
instance, performed around the average with two non-unique comparisons: the cross 
entropies of the Brahms, Handel, and Schubert sub-corpora were not significantly 
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different from the cross entropy resulting from a self comparison. However, the corpus 
does predict itself significantly better than the agglomerated corpus predicts its pieces. 
This result indicates that the Mendelssohn model is coherent insomuch as it predicts its 
own pieces well; however, it also shows that the model is not sufficiently unique, as 
other models predict Mendelssohn’s corpus virtually identically to Mendelssohn’s own 
model. 

On the whole, it seems that these results suggest that grouping corpora by composer 
tends to create coherent corpus models, although these models are often not sufficiently 
unique from one another. 

4.2 Dividing by Chronological Epochs 

Here, pieces were divided into sub-corpora based on their date of composition, first into 
fifty-year epochs, and then in thirty-year epochs. The fifty-year model performed worse 
than that using composer-defined trials, the former returning a 68% uniqueness rate. 
However, the median success rate was higher, registering an 80% uniqueness rate. This 
rate stems from the fact that one time period, 1751-1800, did not have a single 
successful trial; this epoch also did not predict itself better than did the overall corpus. 
Example 2a shows the offending epoch’s results. Not only can the late 18 th -century 
corpus not be significantly distinguished from the late 17 th -century corpus, but the 
other three corpora produce significantly lower cross entropies, indicating that these 
corpora predict the trigrams within the late 18 th -century corpus better than they predict 
the trigrams of their own time periods. The fact that the overall corpus predicted its 
pieces better than did this sub-corpus also indicates this sub-corpus to not be coherent. 

Example 2b shows the case of the 1801-1850 corpus, a relatively successful 
example representing this test’s median. While a self-comparison yields the lowest 
average cross entropy, the average cross entropy when compared with the 1851-1900 
corpus is not significantly different than the self-wise average. (Interestingly, 75% of the 
unsuccessful judgments throughout the 50-year-epoch test (i.e., incorrect/insignificant 
comparisons) involved time periods adjacent to one another; if one removes the late 
18 th -century results from the percentage, this number rises to a complete 100%. In other 
words, with the exception of the problematic late 18 th -century, the models generally 
become “confused” as to a piece’s time period only when comparing that piece to a 
chronologically adjacent corpus.) 


cross entropy of 1751-1800 sub-corpus 
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Fig. 2. Comparative cross entropies of (a) the 1751-1800 sub-corpus, and (b) the 1801-1850 
sub-corpus 
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Dividing the corpora into 30-year segments produced similar results. The overall 
average success rate was 68.75%, and the median success rate was 75%. Half of the 
unsuccessful returns involved adjacent time periods, and 80% were within two time 
periods (i.e., within 60 years). As with the 50-year segments, the remaining 20% were 
not evenly distributed throughout the results, but centered in two particularly unsuc¬ 
cessful epochs. Two trials were not relatively coherent: the 1801-1830 and 1891-1920 
trial. Example 8 shows a median example, the 1741-1770 corpus, while Example 9 
shows the largely unsuccessful 1801-1830 results. (While it may be satisfying that the 
only significant positive results involve corpora that are maximally chronologically 
distant from the 1801-1830 corpus, note that the two adjacent corpora register a 
significant but lower cross entropy.) 
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Fig. 3. Comparative cross entropies of (a) the 1741-1770 sub-corpus, and (b) the 1801-1830 
sub-corpus 


These results suggest that dividing a corpus by chronological epochs may be 
successful in some respects - it creates a high median success rate - but it also 
generates several corpora that are incoherent. From a modeling perspective, this 
incoherence could be explained by the presence of multiple and distinct 
chord-progression practices within a single corpus. For instance, the 1801-1830 corpus 
seems to have properties that are better modeled by the corpora surrounding it, perhaps 
indicating that this era contains practices that overlap those of its two surrounding 
epochs. If dividing corpora by composers seemed to create too many divisions, 
dividing by chronology is too broad, creating incoherent corpus models. Also, these 
tests seem to indicate a connection between chronological proximity and models’ 
similarities. 

4.3 Machine-Learned Sub-corpora 

Using the k-means clusters produced markedly better results, although somewhat 
unsurprising as it used the same metric - chord-progression probabilities - both to 
divide the corpora and to judge the success of those divisions. (However, the results of 
this test do confirm the power of harmonic transition probabilities to classify groups of 
composers into unique and coherent corpora.) The seven clusters provide nearly perfect 
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results, with only Debussy’s corpus providing insignificant/non-unique comparisons, 
likely due to its small membership (n = 60). Figure 4a shows a typical perfect 7-cluster 
trial, using the “Romantic” (Tchaikovsky, Chopin, Liszt, Saint-Saens) sub-corpus. The 
ten clusters performed slightly worse, with an 88% success rate. If, however, one 
discounts the insignificant results of the two smallest corpora - now adding Wagner’s 
corpus (n = 32) to Debussy’s - the results rise to a 97.22% success rate. Figure 4b 
shows one of the two remaining insignificant results, the other being the average cross 
entropy of Vivaldi/Telemann’s pieces given Handel’s corpus. 
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Fig. 4. Comparative cross entropies of (a) the Tchaikovsky, Chopin, Liszt, and Saint-Saens 
sub-corpus, and (b) the Brahms and Schumann sub-corpus 


5 Applications 

This type of modeling has various applications in how we think about and interpret 
musical corpora and the works contained therein. In what follows, I outline four 
potential applications of this kind of modeling, showing ways it can be used to interpret 
stylistic trends and compositional schools, how it can be used to identify points of 
innovation, how it can be used to broach the (admittedly thorny) topic of authorship, 
and how it might be used to formalize models of historical styles. 

5.1 Describing Stylistic Trends and Compositional Schools 

When the model identifies several composers whose sub-corpora and not unique, but - 
when grouped together - create a unique and coherent sub-corpora, this potentially 
identifies a compositional cohort operating within a similar compositional school. Here, 
we imagine that the compositional trends and norms used by these composers are 
sufficiently similar that the variation within their outputs makes them statistically 
indistinguishable (at least within the tested parameters). Furthermore, non-unique 
comparisons can show other potential avenues of influence. For instance, in Fig. 4, 
Brahms and Schumann’s sub-corpus is coherent and unique in all but the comparison to 
the Tchaikovsky-Liszt-Saint-Saens-Chopin sub-corpus. This suggests that the output 
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of these two composers comprise a distinct style that influences the output of these later 
Romantic composers. The fact that chronological adjacencies within the epoch-based 
models frequently accounted for non-unique comparisons also suggests stylistic trends. 
Here, this non-uniqueness captures the chronological developments of historical styles: 
historically proximate sub-corpora share statistical tendencies. 

5.2 Moments of Innovations 

Non-unique and incoherent findings also provide an opportunity for interpretation. At 
these junctures, the lack of similarity within the pieces constituting the sub-corpus begs 
for some kind of explanation: why would pieces written within such chronological 
proximity be so different? 

Consider the case of the 1751-1800 sub-corpus: its constituent pieces are better 
predicted by the statistics of neighboring historical epochs than the pieces in its own 
time period. Looking inside that dataset, one finds groups of composers who would 
seem to be drawn from divergent compositional practices. It is not only a time period 
that saw the late works of Telemann and Scarlatti, but also the complete works of 
Mozart and Haydn, and ended with the mature works of Beethoven. One could sim¬ 
ilarly describe the 1801-1830 corpus: such a division groups middle-period Beethoven 
not only with Schubert, but with Schumann’s early works. The incoherence of these 
sub-corpora, then, supports the idea that these moment host more of a stylistic shift 
than their surrounding eras. As indicated in Figs. 2a and 3b, it seems that significant 
portions of these groupings are better predicted by the surrounding epochs than con¬ 
temporary compositions, further suggesting that these eras feature dramatic shifts 
between the previous and following styles. 

5.3 Authorship 

This modeling technique also allows for potential evaluations of reproductions, com¬ 
pletions, or potentially spurious compositions. In each of these instances, one could 
take the piece(s) in question and treat them like their own sub-corpus, comparing its 
coherence and uniqueness with other sub-corpora in the piece(s) historical orbit. For 
instance, Fig. 5 compares a famous example of forgery to the 10-cluster sub-corpora. 
The forgeries here are those of Nicolas Chedeville publishing Vivaldi’s fictitious “Opus 
13” in 1737. The “X” above each of the bars shows the corpus’ self-wise cross entropy, 
each constituent bar shows the forgeries’ cross entropy compared to other sub-corpora, 
and the final bar again shows the agglomerated corpus’s assessment of the forgeries. 
The sub-corpus is coherent, but it is not unique. Many other sub-corpora fall within the 
standard deviation of the average assessment: as before, these are shown as lined bars. 
Bars outside of the average standard deviation are shown as solid. There are two below 
the average: the Handel and Telemann-Vivaldi clusters. This means that these repro¬ 
ductions do indeed adequately imitate Vivaldi, but do so in a way that they could 
potentially also be passed off as composed by Handel! 
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Fig. 5. Comparative cross entropies of Nicolas Chedeville’s Vivaldi forgeries compared to the 
10-cluster sub-corpora 


5.4 Generative Modeling 

Using these metrics to identify relatively unique and coherent statistical systems can 
potentially create well-formed generative models of some style. For instance, if one 
used the chord-progression (i.e., Markov-chain) probabilities embedded in the, say, 
Brahms-Schumann sub-corpus to generate sucessions of harmonies, one could rea¬ 
sonably argue that this models aspects of that style’s compositional norms. The same 
cannot be said of, say, a model drawn from the 1751-1800 sub-corpus: because of its 
incoherence, it is not clear what such a generative model would capture outside of 
manifesting the era’s stylistic heterogeneity. These metrics, then, can be imagined as 
ways to isolate statistical systems that can express some historically, culturally, or 
compositionally independent style. 


6 Future Work 

Of course, this work is incomplete. It relies entirely on simple Markov chains drawn 
from the very surface of a musical corpus. It is possible, for instance, that judging the 
similarity of two systems using something like a Context Free Grammar or at least 
some hierarchical system would better represent similarities and differences in chord 
progression usage. Additionally, other surface events rather than chord progressions 
may capture salient differences between sub-corpora: melodic figuration, recurrent bass 
lines, orchestration, or ornamentation may all contribute to stylistic differences better 
than (or in addition to) surface chord progressions. However, regardless of these 
potential avenues for future investigation, this study has identified a general method of 
using cross entropy to identify the uniqueness and coherence of various datasets, 
quantifying overlaps and consistencies within musical corpora. 
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Abstract. This paper examines one aspect of Ligeti’s approach to writ¬ 
ing music that is neither tonal nor atonal—the use of complementary 
collections to achieve what Richard Steinitz has termed combinatorial 
tonality. After a brief introduction, the paper explores properties of the 
intervallic content both within and between complementary collections, 
which I term the intra- and inter-harmonies. In particular, the inter¬ 
harmonies are useful in understanding harmonic control in works based 
on complementary collections, as demonstrated by revisiting Lawrence 
Quinnett’s analysis of Ligeti’s first Piano Etude, Desordre. 


Keywords: Ligeti • Desordre • Complementary collections 
Combinatorial tonality 


1 Introduction 

Figure 1 shows the opening of Ligeti’s first Piano Etude, Desordre. A remarkable 
feature of this etude is that the right hand plays only the white keys while 
the left hand plays only the black keys. The etude thus systematically divides 
the aggregate into two quite familiar complementary collections—diatonic and 
pentatonic. Due to the overlapping registers (proximity) and similarity of contour 
in the ascending scalar fragments (common fate), it can be difficulty to separate 
the two hands into independent psychological streams, thus making the diatonic 
and pentatonic collections difficult to hear separately. (See Bregman [3] for the 
importance of pitch proximity and common fate in the formation of independent 
auditory streams.) Instead, it is much easier to hear the between hand note- 
against-note harmonic intervals, which we might term the inter-harmonies. 

As the etude progresses, the hands gradually drift apart, the accents in the 
two hands become desynchronized, and the durations between accents in each 
hand are gradually shortened, leading to a fragmentation of the scalar segments. 
Near the climax of the etude (see Fig. 66), both the lack of pitch proximity and 
common fate in the melodic contours strongly encourages the formation of two 
independent streams, making it difficult to hear the harmonic relations between 
the hands, but relatively easy to hear the within stream intervallic relationships, 
which we might term the intra-harmonies. From the opening to the climax of 
Desordre there is thus a change in focus from the inter-harmonies to the intra¬ 
harmonies. Richard Steinitz [12] has referred to this interplay of complementary 
collections as “combinatorial tonality.” 
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Fig. 1 . Opening of Desordre 


While there are clear precedents in the use of complementary collections, 
Ligeti’s extensive use of opposed collections in his late works and, in particular, 
his exploration of the unfamiliar harmonic possibilities between collections and 
the ways in which a listener’s attention can be focused on the relations within 
(inter) and between (intra) collections is something very different. Indeed, the 
technique of complementary collections may represent Ligeti’s most systematic 
approach to achieving his goal of creating music that is neither tonal nor atonal 
in his late works. (See Ligeti’s own comments in [2].) As such, our lack of under¬ 
standing of the harmonic relations between complementary collections takes on 
greater significance. The current paper explores this aspect of Ligeti’s combi¬ 
natorial tonality, focusing on the relevant mathematical properties of comple¬ 
mentary collections and the first part of Desordre , as a preliminary step to a 
greater understanding of Ligeti’s exploration and realizations of the theoretical 
properties and corresponding possibilities of complementary collections. 1 

2 Properties of Intra- and Inter-harmonies 

We begin examining the harmonic relationship between complementary collec¬ 
tions by looking at the interval content (IC) from one pitch-class set to another 
using Lewin’s [8] interval function (IFUNC) 2 : 

lC A A k ) = IFUNC(A, B)(k) = |{(a, 6) e A x B, b - a = k}\. 

The interval function is a histogram of the pc intervals by which a member of 
one set can move to a member of the other set, yielding a 12-valued vector of pc 
interval multiplicities. For example, the interval content from {C,D} to {Cji, D#} 
is (0, 2, 0,1, 0, 0, 0,0, 0, 0,0,1), indicating that there are two ways to move by pc 
interval 1 (C—»CJt and D—»DjJ), one way to move by pc interval 3 (C—>Djj), 

1 My thanks to Nancy Rogers and an anonymous reviewer for comments that greatly 
improved this paper. 

2 The reader is strongly directed to Amiot [1] for an excellent and detailed presentation 
of the interval function and its relation to recent applications of the discrete Fourier 
transform in music theory. 
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one way to move by pc interval 11 (D—and no ways to move by any other 
interval. 3 In the special instance of the interval content from a set to itself, the 
interval content within a set, we will use the shorthand 1C a, a = 1C a- 

For complementary collections A and A, the intra-harmonies are a combina¬ 
tion of the interval content within each collection separately: 

Sintra = 1C A H - 1C A’ 

For example, setting W to be the white-key diatonic collection (W = 
{0, 2,4, 5, 7, 9,11}), the intra-harmonies for the white-key/black-key complemen¬ 
tary collections are given by 

b^intra = ICw + IC^p 

= (7, 2,5,4,3,6, 2,6,3,4, 5, 2) + (5,0,3, 2,1,4,0,4,1, 2,3,0) 

= (12,2,8,6,4,10,2,10,4,6,8,2). 

Similarly, for complementary collections, the inter-harmonies combine the 
interval content that obtains exclusively between the two collections: 

Winter ~ 1^A,A T 1C a, A' 

For complementary collections, A and A, IC A a = 1C A, a- Thus, 

Winter = 2 • ICa^A’ 

For example, again setting W to be the white-key diatonic collection, the inter¬ 
harmonies for the white-key/black-key complementary collections are given by 

bFinter = 2 • lC w \y 

= 2 • ( 0 , 5 , 2 , 3 , 4 , 1 , 5 , 1 , 4 , 3 , 2 , 5 ) 

= (0,10,4,6,8,2,10,2,8,6,4,10). 

For a given pair of complementary collections, any pc interval must occur 
either within or between the collections, and thus the intra- and inter-harmonies 
exhaust the set of possible pc intervals: 

-Tintra T Winter — lCz n = • • • 5 Ti). 

Figure 2 shows graphs of the intra- and inter-harmonies for two different pairs 
of complementary collections. Note that the distributions of intra- and inter¬ 
harmonies for Fig. 2a are fairly uneven, while the distribution in b is nearly 
flat. Collections in which the interval content is highly uneven may be thought 
to be more distinctive, since the interval content is dominated by only a few 
(and therefore salient) pc intervals. Contrarily, when the interval content is very 
flat, the corresponding collections cannot be typified by a limited number of 
distinct pc intervals. In this sense, there is a strong correlation between the 
“distinctiveness” of a collection and the unevenness of its interval content. 

3 Multiplicity of pc interval i is indicated by the i th component of the interval function, 
which begins with pc interval 0. 
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Histograms of intra- and inter-harmonies Histograms of intra- and inter-harmonies 




(a) ( b ) 

Fig. 2. Histograms of intra- and inter-harmonies for (a) white key/black key collections 
and (b) “flat” hexachord {C, D, D(J, E, F, G|j} and its complement 


We can measure the unevenness of a collection’s interval content by taking 
its standard deviation. For example, the interval content of the “flat” hexachord 
from Fig. 2b has a standard deviation of cr(IC{ 0 ,2,3,4,5,8}) = 1-0, while that of 
the whole-tone collection (with its maximally uneven interval content of all even 
intervals and no odd intervals) has a standard deviation of cr(IC{o 5 2 , 4 , 6 , 8 , 10 }) = 
3.0. The standard deviation of the white-key interval content lies between these 
two extremes: cr(IC{o, 2 , 4 , 5 , 7 , 9 ,n}) ~ 1.66. 4 

As measured by the standard deviation of interval content, a collection and 
its complement are equally distinctive 5 : 

a(lC A ) = a(lC A ). 

Moreover, the distinctiveness of the intra-harmonies is the same as the inter¬ 
harmonies: 

^(^-intra) = ^(^inter)- 

Since Ligeti’s favorite complementary collections, including diatonic, pentatonic, 
whole-tone, Guidonian and similar collections, are all highly distinctive, this 
ensures that the interval content between these collections and their complements 
will also be highly distinctive. 

This does not, however, guarantee that the intra- and inter-harmonies will 
also be highly differentiated. In order to measure this differentiation, we can take 
the magnitude of the difference of the two vectors: 

11 Winter ^intra||2* 


4 The distinctiveness of a collection, A, can also be measured in terms of the magnitude 
of its interval content, ||ICa|| 2 - (See Callender [4].) For the present purposes, the 
standard deviation is preferable. 

5 This follows directly from the complement theorem. See Hanson [6] and Lewin [7]. 
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For example, let W be the white-key collection and X be the “flat” hexachord 
{0,2,3,4,5,8}. The Euclidean distance between the intra- and inter-harmonies for 
the diatonic collection is nearly 24 (||Winter — Sintra ||2 ~ 23.98), while the corre¬ 
sponding distance for the flat hexachord is nearly only 14 (||Wnter — Sintra||2 ~ 
13.86), reflecting the high differentiation of intra- and inter-harmony distribu¬ 
tions in Fig. 2 a and relatively low differentiation in Fig. 2b. 6 

More generally for complementary collections, there is a strong relationship 
between the differentiation of intra- and inter-harmonies and the distinctiveness 
of the collections. If complementary collections A and A are the same cardinality, 
then 

11 Winter ^intral^ OC a ) • 

(Specifically, for a chromatic universe of C pitch classes, 11Winter — Sintra ||2 = 
4 VC • c t(ICa )•) If the cardinalities are nearly equal, then the relation is nearly, 
though not exactly, proportional. Thus, highly distinctive collections indeed 
posses highly differentiated intra- and inter-harmonies. By working with highly 
distinctive collections, Ligeti ensures that there will be maximal variation 
between the melodic and harmonic components of the resulting combinatorial 
tonality. 

3 Desordre 

Returning to the opening of Desordre (Fig. 1), we would like to answer the fol¬ 
lowing question: To what extent does Ligeti exert control over the note-against- 
note harmonies in the etude? The opening of Desordre consists of two layers 
that persist throughout the entire etude. The accented notes correspond to a 
highly complex isorhythmic structure, detailed in Kinzler [9], in which the left 
and right hands have very similar but non-identical colores (sequences of pitches) 
and taleae (sequences of durations). While the accents of the two hands are syn¬ 
chronized in the beginning of the etude, they quickly become misaligned, due 
to the slight difference in their taleae. Unaccented notes are not a part of the 
isorhythmic structure, but rather form a second layer consisting of generally 
ascending scalar fragments used to smoothly connect the accented notes. Per¬ 
haps the harmonic relations at any point are simply the result of the particular 
and temporary configuration of the two hands within the overarching isorhyth¬ 
mic structure. If this is the case, then over a large enough span of time the 
observed harmonies will be equivalent to the result of repeated random selec¬ 
tion from the distribution of possible harmonies between the two collections. In 
other words, as the etude progresses, the distribution of observed harmonies will 
converge on the expected distribution of the inter-harmonies. 

In the opening line of Desordre (Fig. 1), tritones and minor sixths pre¬ 
dominate, while there are almost no minor second/major sevenths or perfect 

6 Measuring the distance between intra- and inter-harmonies using other metrics, such 
as angular (or cosine) distance, yields similar relative distances. (See Rogers [11].) 
The Euclidean metric is sufficient and advantageous for the present purposes. 
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fourths/fifths. The lack of interval class (ic) 5 is easily explained by the relative 
lack of this interval in the inter-harmonies. However, the relative lack of ic 1 
may indicate some degree of control on the part of the composer, since there are 
plenty of minor seconds/major sevenths spanning the two collections. Does this 
favoring of tritones and minor sixths (major thirds) over minor seconds/major 
sevenths persist? 

Figure 3 gives the normalized actual (observed) and expected interval dis¬ 
tributions for the first section of the etude, concluding with the climax on the 
first downbeat on the sixth page of the published score. 7 (Intervals are reckoned 
between hands interpreted as pitch-class sets.) There is a noticeable emphasis on 
ic 6 and de-emphasis on ic 1. In his dissertation on harmony and counterpoint in 
Ligeti’s Etudes , Quinnett [10] presents a comparison of observed and expected 
interval counts of the first section of Desordre divided into two parts, with the 
second part beginning where the accents of the two hands temporarily become 
(mostly) realigned (£ 2 , beginning just before the bottom system on page 2 of the 
score). (See Figs. 4 and 5.) Quinnett notes that the interval profile of the first 
section heavily favors tritones and minor sixths/major thirds over minor sec¬ 
onds/major sevenths, while the interval content for the second section is much 
more similar to the expected distribution. 



pc intervals 


Fig. 3. Histograms of observed and expected intervals in the first section of Desordre 

Why are the observed and expected distributions of the second part much 
more similar than in the first section? Quinnett suggests that this is due to 
the progressive rhythmic diminution of the isorhythmic structure that begins in 
the second part. As the durations of the talea decrease, the density of accented 
notes from the color increases, and the freedom that Ligeti had in his choice 
of pitches diminishes. Thus, as discussed above, we would expect the interval 
content to become increasingly governed by the distribution of intervals in the 


7 Statistical analysis of Desordre was greatly aided by Cuthbert’s music21 [5], which 
is a Python toolkit for computer-aided musicology. 
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Fig. 4. Passage before and after the realignment of accents, marked by the vertical bar 
at £ 2 - Instances of interval class 1 are marked with asterisks. 


Histograms of expected vs. observed pc intervals Histograms of expected vs. observed pc intervals 

in Desordre from the beginning to t2 in Desordre from t2 to the end of section 1 



pc intervals pc intervals 


(a) ( b ) 

Fig. 5. Histograms of observed and expected intervals in the (a) first and (6) second 
part of the first section of Desordre (after Quinnett [10]) 
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inter-harmonies. While the histograms of Figs. 3 and 5 and Quinnett’s expla¬ 
nations are suggestive, in what follows we will briefly consider the statistical 
significance and size of the differences between the observed and expected inter¬ 
val distributions, look at the progression of the observed intervals at a finer level 
of detail, and consider alternative explanations for the convergence of observed 
and expected intervals toward the end of the first section. 8 

In order to compare the actual distribution of between-hand intervals in 
Desordre with the expected distribution based on the inter-harmonies, y 2 good¬ 
ness of fit tests were conducted for various time spans within the first section. 
In all cases the null hypothesis (Ho) is that the observed intervals are consistent 
with the distribution of the inter-harmonies. The alternative hypothesis (Hi )— 
that the observed intervals are not consistent with this distribution—implies 
that Ligeti is exerting control over the harmonic quality of a given time span in 
ways that cut against simple scalar connections between notes of the isorhythmic 
structure. Table 1 gives observed and expected interval counts in the first section 
of Desordre along with the corresponding y 2 statistic and p- value. The test con¬ 
firms the intuition that the differences between the two distributions are highly 
significant, though it does not address the size of this difference (see below). 


Table 1 . Contingency table of observed and expected intervals in the first section of 
Desordre , standardized residuals, and corresponding y 2 statistic and p-value 


Pc intervals 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

Observed 

74 

40 

67 

96 

25 

147 

26 

108 

55 

32 

72 

Expected 

106.0 

42.4 

63.6 

84.8 

21.2 

106.0 

21.2 

84.8 

63.6 

42.4 

106.0 

Std. residuals 

-3.11 

-0.37 

0.43 

1.22 

0.83 

3.98 

1.04 

2.52 

-1.08 

-1.60 

-3.30 


y 2 = 50.05, p < 0.000001 


The standardized residuals (^=b where O and E are the observed and 
expected counts, respectively) quantify the contribution of each pc interval to 
the overall y 2 value. The most significant values in this row (shown in bold) 
identify the categories that are driving the lack of fit between the two distribu¬ 
tions. In particular, observed pc intervals 1 and 11 are significantly less frequent 
than expected, while pc interval 6 is significantly more frequent than expected, 
confirming intuitions based on Figs. 3 and 5. 

In Table 2 the first section of Desordre is divided into various subsections, 
based on five time points measured in eighth notes from the beginning of the 
etude: to = 0 is the beginning of the etude where there are very few instances 
of interval class 1 (see Fig. 1); t\ = 160 marks the beginning of a passage with 


8 Comparison of interval counts with the inter-harmonies in Desordre in both Quin¬ 
nett’s treatise and the current paper stem from our conversations while Quinnett 
was a student at Florida State University. 
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Table 2. Comparison of observed and expected intervals for various time spans in the 
first section of Desordre with p -values and reduced phi coefficients (<f> u ) for y 2 goodness 
of fit tests 


begin 

end 

p -value 

0 

£0 

£4 

<0.0001 

0.26 

£0 

£2 

<0.0001 

0.60 

£2 

£4 

0.91 

0.10 

£0 

£1 

<0.0001 

0.72 

£1 

£2 

0.01 

0.49 

£2 

£3 

<0.0001 

0.79 

£3 

£4 

0.96 

0.10 




(b) 


Fig. 6. (a) section before and after time point £3, (b) section immediately before the 
end of the first section (£4). Asterisks indicate instances of interval class 1. 

slightly increasing presence of ic 1 (see Fig. 4); £2 = 248 marks the realignment of 
accents between the two hands, accompanied by a return to very few instances 
of ic 1 (see Fig. 4); at £3 = 316 durations of the isorhythm are progressively 
shortened and ic 1 becomes much more prevalent (see Fig. 6 a); and £4 = 634 is 
the end of the first section, which includes accents in both hands on every pulse 
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(see Fig. 6b). The final column of the table reports the phi coefficient, </>, which 

is a x 2 -based measure of effect size: 4>=\J where n is the number of samples 
in the data. Larger values for </> indicate a greater difference between the two 
distributions . 9 

The results of Table 2 suggest that perhaps Ligeti exerted a finer degree of 
intervallic control than can be captured by dividing the opening section into 
two large parts as in Fig. 5. Row 1 of the table repeats the test from Tablet of 
the entirety of the first section. Rows two and three divide the section into two 
parts. In first part, from £q to £ 2 , the difference between observed and expected 
intervals is significant and also has a larger effect size than the entire section. In 
the second part, from £2 to £ 4 , the differences are not significant and the effect 
size is correspondingly very low. Time point t\ divides the time span from £ 0 to 
£2 into two subparts in rows four and five, and time point £3 similarly divides the 
span from £2 to £4 in rows six and seven. In both pairs of rows the first subpart 
differs strongly from the expected interval distribution, while the effect size is 
lower in the second subpart due to the increase in the prevalence of interval 
class 1. This is particularly true in the time span beginning at £ 3 . The upshot is 
that changes in the interval distribution support a division of the opening section 
into two parts, with a significant return to synchronized accents and avoidance 
of ic 1 at £ 2 . This sense of return is enhanced by the slight increase in ic 1 in the 
span from t\ to £ 2 . 

These changes in the intervallic distribution over the course of the first section 
can be seen more clearly in Fig. 7, which plots <p for a moving window of 65 
eighth notes. Here, (j) is based on a y 2 goodness of fit test consisting of only 
two categories of intervals: those that belong to interval class 1 and those that 
do not. This graph demonstrates that the changes in harmonic content noted 
in Table 2 happen abruptly rather than gradually. (This is evident even though 
the transitions between regions of higher and lower values for <p are smoothed 
by the moving-window analysis.) Regions of lower values for <p are either mostly 
or almost entirely below the lines indicating various significance levels, whereas 
regions of higher values are almost entirely above these lines. These abrupt 
transitions as well as the return at £2 of the intervallic content of the opening 
complicate the earlier explanation of the changing harmonic distribution over 
the course of the section. If these changes were simply the result of the progres¬ 
sive rhythmic diminution of the isorhythmic structure (beginning at £ 3 ) and the 
consequent lack of harmonic freedom, the values for (j) in Fig. 7 would remain con¬ 
sistently high until £3 and then gradually decrease. Ligeti appears to be exerting 
control in switching from one distribution to the other. 

To the extent that the quickening of the isorhythms plays a role in the dif¬ 
fering intervallic distributions of the two parts, might there be other factors 
involved? Perhaps as the hands drift apart toward the registral extremes of 
the piano, Ligeti became less concerned with note-against-note harmonies, since 
the increased distance between the two hands encourages the perception of two 

9 Note that because there are more than a single degree of freedom in the data, <j) is 
not normalized to a maximum of 1. 
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Fig. 7. Moving-window analysis of observed vs. expected frequency of interval class 
1 in the first section of Desordre. Larger values for cj) correspond to a greater difference 
between observed and expected frequencies. Window size is 65 eighth notes. Expected 
frequencies are based on the inter-harmonies. Time points £i, £ 2 , and £3 are indicated 
with vertical lines. Lines running across the graph indicate values for 0 corresponding 
to various levels of significance. 


independent streams and makes it difficult to perceive the quality of the between- 
hand intervals. The challenge in assessing the relative strengths of these two 
explanations is that interval size and talea durations are strongly (inversely) 
correlated. 

One approach to separating these factors is to divide intervals for each time 
point by size and whether or not an accent is present. (Recall that accents always 
and only accompany elements of the isorhythm, so the presence of accents can 
be used as an indicator for the presence of the isorhythm.) In Fig. 8 intervals 
throughout the first section of Desordre are divided into categories based on 
small (S) or large (L) interval size and presence (T) or lack (F) of accents. 
(Small intervals are no larger than two octaves. Pitch intervals involving octaves 
are reckoned from the lower note of the octave.) For example, the interval in the 
first eighth note of the etude belongs to the category ‘S-T’, since it is less than 
two octaves and the time point contains at least one accent. The interval of 30 
semitones on the unaccented time point immediately before £3 (Fig. 6 a) belongs 
to the category ‘L-F’. 

The plot on the left of Fig. 8 shows the ratio of interval class 1 for each 
category, while the plot on the right shows the ratio of interval class 6 . (Recall 
that interval classes 1 and 6 had the highest standardized residuals in Table 1 
and were most responsible for the divergence between observed and expected 
interval frequencies.) As the categories progress from ‘small intervals without 
accents’ to ‘large intervals with accents’ there is a clear trend in the ratios of both 
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Comparing the effect of interval size and presence of accents 
on the ratio of interval class 1 
Error bars: 95% confidence interval 


0.35 



Categories of intervals 
S = small intervals, L = large intervals 
T = with accent(s), F = without accent 


Comparing the effect of interval size and presence of accents 


on the ratio of interval class 6 

Error hare: 95% confidence interval 



Categories of intervals 
S = small intervals, L = large intervals 
T = with accent(s), F = without accent 


Fig. 8. Prevalence of interval classes 1 and 6 for intervals divided into categories of 
small or large interval size and with or without accent(s). (Small intervals are no larger 
than two octaves.) 

Table 3. Significance and effect size for prevalence of interval classes 1 and 6 depending 
on interval size and presence of accents 


Categories 

ic 1 

ic 6 

Small size, accent varies 

p< .0001,0= .23 

p < .01, 0 = .14 

Large size, accent varies 

p= .34,0 = .05 

p = .60, 0 = .03 

No accent, size varies 

p < .001, 0 = .24 

p < .02, 0 = .15 

1 or 2 accents, size varies 

p= .07,0= .08 

p < .05, 0 = .09 


interval classes from their (de-)emphasis at the beginning of the etude toward 
their expected ratios corresponding to the inter-harmonies, though not all of the 
differences between categories are significant. Table 3 summarizes the significance 
and effect size for interval classes 1 and 6 when holding interval size constant 
while varying presence of accents and vice versa. For both interval classes there 
is a significant and moderate effect of the presence of accents when the interval 
size is small and of interval size when no accents are present. There is a small 
and borderline significant effect of interval size when accents are present and a 
notable lack of effect of the presence of accents when the interval size is large. 
Taken together, the prevalence of these two interval classes differs noticeably 
from the inter-harmonies when both the size of the interval is not greater than 
two octaves and no accents are present; when neither of these conditions is 
present, the prevalence of these interval classes does not differ strongly from the 
inter-harmonies. Thus, both explanations seem to be justified—the isorhythmic 
structure limited Ligeti’s freedom in controlling note-against-note harmonies and 
Ligeti exerted less control over harmonic intervals as the hands drifted apart, 
forming independent streams, and focusing the listener’s attention on the intra- 
rather than inter-harmonies. 
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Abstract. This paper examines the computational problem of taking 
a classical music composition and algorithmically recomposing it in a 
ragtime style. Because ragtime music is distinguished from other musi¬ 
cal genres by its distinctive syncopated rhythms, our work is based on 
extracting the frequencies of rhythmic patterns from a large collection 
of ragtime compositions. We use these frequencies in two different algo¬ 
rithms that alter the melodic content of classical music compositions 
to fit the ragtime rhythmic patterns, and then combine the modified 
melodies with traditional ragtime bass parts, producing new composi¬ 
tions which melodically and harmonically resemble the original music. 
We evaluate these algorithms by examining the quality of the ragtime 
music produced for eight excerpts of classical music alongside the output 
of a third algorithm run on the same excerpts; results are derived from a 
survey of 163 people who rated the quality of the ragtime output of the 
three algorithms. 


Keywords: Algorithmic composition • Ragtime • Corpus-based study 


1 Introduction 

Ragtime is a musical genre that is best described by its syncopated, or ragged, 
rhythms. Studies of ragtime compositional techniques have concluded that synco¬ 
pation is the single unifying characteristic of the genre. Though widely assumed 
to be a term only applied to piano music, ragtime encompasses a wide variety of 
instrumentations, techniques, and styles [1]. Widespread conceptions and mis¬ 
conceptions regarding the ragtime genre have recently led to the creation of a 
corpus of digitized ragtime compositions in the MIDI file format [10] to enable 
large-scale studies of the ragtime music. Not surprisingly, concurrent with the 
development of this corpus and others like it is the rise of data-driven studies in 
music informatics, with large-scale data sets being used to discover or confirm 
trends and tendencies about classical and pop music alike [3,6]. In particular, 
it is tempting to apply corpus techniques to the field of algorithmic composi¬ 
tion in order to automatically extract the unifying characteristics of a musical 
genre and apply those patterns in a new composition. This is the problem we 
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examine here: we test the feasibility of composing ragtime music based solely on 
probabilistically applying rhythmic patterns extracted from a corpus of ragtime 
music to existing classical compositions. Specifically, we develop two databases 
of rhythmic patterns derived from a corpus of roughly 5,000 ragtime pieces and 
propose two algorithms that alter the rhythms of existing classical melodies to 
sound more like the ragtime rhythms in the databases. We evaluate these algo¬ 
rithms through a survey of 163 people who rated the quality of the ragtime music 
produced by these algorithms. 

2 Methodology and Algorithms 

The goal of this research is to study the feasibility of algorithmically composing 
ragtime music based solely on realigning existing classical music melodies to fit 
into ragtime rhythms. We algorithmically discover what rhythms are appropriate 
in ragtime by using a previously-created corpus of ragtime compositions, known 
as the RAG-collection, or the Rag-C data set. This data set is a collection of 
11,591 MIDI files of ragtime music originally introduced by Volk and de Haas 
[10] as a first effort in putting together a large-scale database of ragtime music. 
These ragtime compositions were originally compiled and organized by a group 
of ragtime music enthusiasts collaborating over the internet, and are sourced 
from various original ragtime scores, from piano rolls translated into the MIDI 
format, and from recordings of performances as well. Though the data set con¬ 
tents vary in quality in terms of the MIDI translations, it is probably the most 
comprehensive collection of ragtime music in a symbolic digitized format known. 



Fig. 1 . Methodological setup 


Figure 1 illustrates the components in our research setup, which broadly con¬ 
sists of a preprocessing stage and an experimental stage. The goal of the pre¬ 
processing stage is to create two databases of rhythmic patterns. We do this by 
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processing the Rag-C data set with a beat induction algorithm; this is a neces¬ 
sary step as the MIDI files in the corpus do not contain enough information to 
automatically discover rhythmic information such as time signatures or measure 
boundaries. After the beat induction algorithm aligns the music metrically, we 
extract the melodies as monophonic sequences of notes using a standard skyline 
algorithm. This leaves us with a series of measures which we further process into 
two rhythmic pattern databases. 

The goal of the experimental stage is to allow one to transform an exist¬ 
ing classical music composition into a ragtime composition. Beginning with a 
classical music composition, we identify the monophonic main melody of the 
composition and its harmonic chordal structure. We send the main melody to 
the rhythm changer algorithms, which alter the metrical placement and duration 
of the notes in the main melody—but not their ordering—to make the melody 
sound more like ragtime. This is accomplished with the help of the information 
in the rhythmic pattern databases. At the same time, we use the harmonic struc¬ 
ture of the classical music to compose a prototypical ragtime bass line, which is 
combined with the altered melody into a final composition in a ragtime style. 

We give further details of each step of this process below. 

2.1 Beat and Measure Detection 

Though MIDI files can be encoded to include information such as time signature 
and tempo, the files in the Rag-C data set are derived from a variety of sources, 
including live performances, some of which do not encode these data. Therefore, 
we use a version of Dixon and Cambouropoulos’s beat detection algorithm [4] 
to estimate the locations of musical beats in the MIDI data. 

The algorithm’s beat inducer operates by computing inter-onset intervals, 
or IOIs—times between pairs of note onsets—for the input MIDI file and then 
clustering them in the hope that small differences in the IOIs will be smoothed 
out. The clusters are then ranked by size, and the top-ranked clusters usually 
correspond to the inter-beat interval or a fraction thereof. With a correctly- 
predicted inter-beat interval, measure boundaries can be easily calculated for 
the entire piece. 

In practice, however, the inter-beat interval predictions may be slightly mis¬ 
calculated. We noticed a certain amount of “temporal drift” in the measure 
boundary predictions for the Rag-C data set. Specifically, as one predicts mea¬ 
sure boundaries further and further ahead in a MIDI file, the predicted bound¬ 
aries deviate more and more from their true locations, most likely due to the 
accumulation of small errors caused by a slightly miscalculated inter-beat inter¬ 
val. To remedy this, we returned to the top-ranked clusters calculated by the 
beat induction algorithm, and examined every possible inter-beat interval within 
12 MIDI ticks of the cluster’s interval, calculated to the tenth of a tick. For every 
potential inter-beat interval, we calculated the predicted measure boundaries for 
that interval over the entire MIDI file, then binned all the notes of the MIDI 
file according to which 16th note of the predicted measure they would fall into. 
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For correctly-predicted measure boundaries, we would expect this frequency dis¬ 
tribution of notes across the 16 bins to be weighted more heavily towards the 
bins corresponding to strong beats, simply because it is more common—even in 
ragtime—for notes to occur on strong beats. For incorrectly-predicted measure 
boundaries, we would expect this distribution to be flatter. Therefore, we chose 
the predicted set of measure boundaries that produced a frequency distribution 
with the highest standard deviation as our correct measure boundaries. 


2.2 Melody Extraction and Pattern Database Construction 

We use an adapted version of Temperley’s streamer algorithm [8] to split the 
Rag-C MIDI files into streams of notes. In order to isolate the main melodic 
voice in each file, a skyline algorithm is used as described in [9,10] to select a set 
of notes from these streams using the average pitch of all the notes in a given 
stream as its height. 

Though the Rag-C data set contains over 11,000 MIDI files, we used a specific 
set of 5,176 for this project. We omitted all files with changing tempos—mostly 
from live performances—because our algorithm for detecting measure bound¬ 
aries assumed a fixed tempo. Additionally, there were many excessively long 
MIDI files—containing many repeats of sections of the music—that could not 
be processed by the melody extraction algorithm due to memory limitations. 

Recall that our ultimate goal is to produce new ragtime music by realign¬ 
ing classical music melodies to fit ragtime rhythms. In order to choose ragtime 
rhythms appropriately during the algorithmic composition phase, we analyze 
the rhythms of the melodies in the 5,176 MIDI files. We do this by assuming 
all the ragtime compositions are in 2/4 or 4/4 time (a reasonable assumption 
for ragtime), and segment each piece at the level of a 4/4 measure (merging 
consecutive measures of 2/4 pieces). We represent the rhythm of each 4/4 mea¬ 
sure using the method described in [5,10]: the rhythm of a measure is described 
by a pattern “I”s and u O”s specifying the locations of the note onsets at the 
granularity of a 16th note: an “I” standing for an onset and “O” standing for 
no onset at that time. For instance, a 4/4 measure with a quarter note on every 
beat would be represented by the string “I000I000I000I000.” We refer to these 
strings as binary onset patterns because their contents can be represented by Is 
and 0s instead of Is and Os. 

Once every ragtime composition is converted into a sequence of binary onset 
patterns, we create two rhythmic pattern databases, Version 1 (VI) and Version 
2 (V2). The VI database simply records the frequencies of every possible binary 
onset pattern observed in the melodies of the 5,176 ragtime MIDI files. The 
V2 database records the frequency of transitions between binary onset patterns 
corresponding to every pair of adjacent measures in the corpus. In Sect. 3, we 
describe some noteworthy facts that can be learned from examining the infor¬ 
mation in the rhythmic pattern databases. 
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2.3 Experimental Phase 

The experimental phase is designed to harness the information in the pattern 
databases in order to produce new ragtime compositions from classical music 
files. For testing and evaluation, we use a set of eight excerpts of classical music. 
These excerpts are taken from “Dance of the Sugar Plum Fairy” by Tchaikovsky; 
“Eine kleine Nachtmusik,” by Mozart; Concerto No. 1 in E major, Op. 8, RV 269, 
“Spring” by Vivaldi; three Christmas carols: “Deck the Halls”, “Hark! The Her¬ 
ald Angels Sing”, “Jingle Bells”; and two traditional tunes: “Old MacDonald Had 
a Farm” and “Yankee Doodle.” We chose these pieces for their easily-identified 
melodies and duple meters. 

We encoded the melody and harmony separately for these eight testing files 
and used them as input to the chord generation and rhythm changing algorithms, 
described next. 

2.4 Chord Generation 

The chord generation algorithm generates a ragtime-style bass line consisting of 
a sequence of chords. The input to the algorithm is a sequence of chord symbols, 
in this case from one of the eight classical music excerpts which have had their 
harmonies manually labeled. We turn these chord symbols into ragtime-style 
chord progressions based on a subset of the guidelines prescribed in [2]. We use 
a straightforward algorithm: we choose octaves or single bass notes on the first 
and third beats of a measure and chords on the second and fourth beats. The first 
beat is always the root of the current harmony, and the third beat is always the 
fifth. Additionally, we stochastically change some of the second- or fourth-beat 
chords into passing tones if the surrounding harmonic structure allows for this 
transformation. We found a probability of 1/6 works well for choosing whether 
or not to insert a passing tone. 

As an example, Fig. 2 shows the bass line generated from the first eight 
measures of the fourth movement of Beethoven’s Ninth Symphony (“Ode to 
Joy”). Notice how there is a passing tone generated in the transition from the 
end of measure 4 into measure 5. 


fjf i jfjf i Jjfuf jH i jf 


Fig. 2. Illustration of chord generation for “Ode to Joy.” The chord symbols above the 
staff are used as input, and the notes displayed are the output. Note the passing tone 
in measure 4. 
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2.5 Rhythm Changing Algorithms 

At the heart of this algorithmic composition system is the rhythm changing 
algorithm. Recall that our goal is to adjust the rhythm of a classical melody 
to fit a ragtime rhythm. We do this by identifying, for every measure of the 
classical input composition, a corresponding ragtime measure with the same 
number of notes, and altering the classical measure to fit the ragtime measure’s 
rhythm. For example, consider Fig. 3, which shows (on the top left) a measure 
of music taken from the Christmas carol “Deck the Halls,” and also (on the 
top right) a measure taken from the ragtime composition “The Entertainer,” by 
Scott Joplin. The rhythm changer would combine these measures into the new 
measure of music at the bottom of the figure, which aligns the notes of the “Deck 
the Halls” melody with “The Entertainer” ’s rhythm. 


[) 1 o- 



A h /C ^ 

vs) 4 ' 

• 

9 — 4 



P ^ m - 

, 

A /C 



IV J 


it 





3 


Fig. 3. An example of the rhythm-changing algorithm. We combine a classical melody 
(top left) with a ragtime rhythm (top right), producing a new measure of ragtime¬ 
sounding music (bottom) 


We develop and evaluate two different strategies for using the rhythm changer 
in conjunction with the VI and V2 rhythmic pattern databases. 

Recall that the VI rhythm database simply stores the frequency of every 
binary onset pattern in the corpus of ragtime MIDI files. Given a piece of classical 
music as input, our goal is to probabilistically generate a set of rules that map 
every binary onset pattern in the input music to a ragtime binary onset pattern, 
to which we then apply the rhythm changer algorithm as described above. We 
generate this set of rules by enumerating all the onset patterns in the input 
classical composition on a measure-by-measure basis, and sorting them in order 
of descending frequency. For every one of the classical onset patterns, we choose 
a corresponding ragtime onset pattern proportionally to its frequency in the VI 
database (keeping in mind that the number of onsets in the two patterns must be 
equal), and then create a rule associating those two binary onset patterns. There 
are two caveats. First, because un-syncopated rhythms (e.g. I000I000I000I000) 
are so common in the data, rules that map an onset pattern to itself are never 
permitted. Additionally, rules that would shift onsets by a total of more than 
eight positions (where a position is an individual 16th note shift) are rewritten to 
prevent drastic changes, such as a note at the beginning of a half measure being 
shifted to the end of the measure. This technique is presented as Algorithm 1. 
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The strategy for using the V2 database is similar to that of the VI database, 
except we examine pairs of binary onset patterns in the classical input and in 
the ragtime corpus. This technique is presented as Algorithm 2. 


Algorithm 1 . Version 1 

for each unique binary onset pattern X in the input song do 
Count the number of onsets in X 
while a rule has not yet been generated do 

Choose a random binary onset pattern Y with the same number of onsets in 
the data set, weighted by its frequency 

if Y ^ X and onset shifting distance from X to Y < 8 then 
Generate new rule X —> Y 

end if 
end while 

end for 

for each measure in the input song with binary onset pattern X do 

Generate measure in output song with note positions Y from the rule X —> Y and 
pitches from the original measure 

end for 


Algorithm 2. Version 2 

for each unique binary pair of onset patterns X and its subsequent measure Y in 
the input song do 

Count the number of onsets in X and the number of onsets in Y 
while a rule has not yet been generated do 

Randomly select a transition Z from the transition table in which a measure 
with count of X transitions into a measure with count Y 
if Z 7 ^ Y and onset shifting distance from Y to Z < 8 then 
Generate new rule Y —► Z 
end if 
end while 
end for 

for each measure in the input song with binary onset pattern Y do 

Generate measure in output song with note positions Y from the rule Y —> Z and 
pitches from the original measure 

end for 


2.6 Syncopalooza Rhythm Changer 

To serve as a baseline algorithm, we implemented the Syncopalooza algorithm 
as described in [7]. This algorithm is neither data- nor corpus-driven, but rather 
manipulates the syncopations in a composition by shifting note onsets to stronger 
or weaker metrical positions individually, rather than by rewriting an entire 
measure of rhythm at once. 
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3 Results, Survey, and Evaluation 

Some interesting results can be gleaned from the VI pattern database which 
records the frequency of every binary onset pattern in the 5,176 ragtime compo¬ 
sitions. In general, the VI database frequencies display a long-tailed distribution 
as can be seen in Fig. 4; the correspondence between the frequency of a pattern 
and its rank in the list follows a power law relationship. Furthermore, the most 
commonly-occurring binary onset patterns in ragtime music do not correspond 
to syncopated rhythms at all, but rather to simple rhythms such as a measure of 
two regularly-spaced half notes (the most frequent pattern), a measure with one 
whole note (the second-most frequent pattern), or a measure of four regularly- 
spaced quarter notes (the third-most frequent pattern). Though the data set 
contains 8,803 different binary onset patterns, these three patterns account for 
roughly 11% of all the measures in the corpus. It is noteworthy that the fourth- 
most common rhythm, with onsets on beats 1, 2, and 4 (but not 3), displays the 
characteristic “short-long-short” pattern of note durations which is especially 
prevelant in ragtime [5]. 

In order to evaluate the quality of the rhythm-changer algorithms presented 
earlier, we conducted two separate surveys. The surveys differed in length and 
in the participant demographics, but contained the same basic type of question. 
Each question in the survey asked the participant to listen to three different 
algorithmically-produced ragtime excerpts, one each derived from the VI and 
V2 databases paired with the rhythm-changer algorithm, and the third from the 
Syncopalooza algorithm. The order of the three excerpts was randomized for 
every question. After listening to each excerpt as many times as the participant 
desired, they were asked how much they agreed with the statement “This excerpt 
sounds like ragtime” for each of the three excerpts. Their answers were recorded 
on a five-point Likert scale, with the choices of strongly disagree (1), disagree (2), 
neither agree nor disagree (3), agree (4), and strongly agree (5). 

The first survey was taken by 33 college undergraduates with some familiarity 
with ragtime music. Each participant was asked to evaluate 6 sets of excerpts 
(listening to 18 excerpts in total) according to the schema above. The second 
survey was taken by 130 different people solicited from internet message board 
about piano music. 

The college students’ average responses to the questions on the Likert scale 
were 3.50, 2.83, and 2.99 for Syncopalooza, VI, and V2, respectively; while the 
internet users’ average responses were 3.36, 2.55, and 2.64, respectively. These 
values indicate that while the college students rated the output of all three 
algorithms slightly higher than the general internet population did, both groups 
preferred Syncopalooza to the algorithms presented in this study, though by 
less than one point. Furthermore, because even the best-performing algorithm— 
Syncopalooza—did not surpass a 3.50 rating, there is clearly plenty of room for 
improvement in the algorithms. 
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Fig. 4. This graph illustrates the frequencies of the 20 most common binary onset 
patterns found in the ragtime corpus. 


Figure 5 illustrates the survey results grouped by the classical music piece 
used as input. We can see that Syncopalooza consistently outperforms both 
the VI and V2 algorithms, though there are cases where all three scores are 
clustered closely together. It is also noteworthy that neither VI nor V2 consis¬ 
tently outperforms the other; their ratings are usually close together. 

Anonymous comments solicited from the survey participants revealed some 
of the reasons for their ratings. Multiple users noted some inconsistencies and 
mismatches between the beats of the melody and the bass notes of the accom¬ 
paniment. We believe this may be due to (1) the algorithms moving notes that 
were originally consonant with their underlying harmonies to new metrical loca¬ 
tions that then become dissonant with the underlying harmonies; (2) the algo¬ 
rithms choosing ragtime rhythmic patterns that, while technically common, do 
not “flow” with the preceding or following music; (3) beat induction errors. Mul¬ 
tiple users also commented on the low register of the accompaniment chords; 
some thought this rendered the audio muddy and distracted from the overall 
sound. We plan on investigating these issues fully in the next iteration of this 
project. 
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■ Sync VI ■ V2 
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deck eine hark jingle old spring sugar yankee 

■ Sync 1V1 ■ V2 


Fig. 5. Survey responses from college undergraduates (top), and internet respondents 
(bottom), separated by classical input composition and by ragtime algorithm (Synco- 
palooza, rhythm shifter version 1, rhythm shifter version 2). Participants were asked 
how much they agreed that the output music sounded like ragtime. 
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Abstract. Salsa is a long-established music genre. It has been used 
as a way to define, identify and express social beliefs. Due to the lim¬ 
ited computational study of this genre, we consider relevant to identify 
and analyze the musical features of this music genre. Thus, we train a 
corpus with Grupo Niche songs for generating the production rules for 
an induced probabilistic context-free grammar through a probabilistic 
parser. In addition, we implement a web-based tool to support musical 
composition and generate automatic Salsa songs. In this work, we also 
compare three automatic songs using cross-validation on the corpus. We 
show the stability of the grammar because the precision of the generated 
songs compared to corpus’ songs is close to those that are not in the 
corpus. 


Keywords: Salsa • Treebank • Probabilistic context-free grammar 
Rules • Probabilistic parser • Musical composition • Web-based tool 
Precision • Recall • Automatic songs 


1 Introduction 

Salsa is a long-established music genre in modern Latin American culture and it 
has been used as a way to define, identify and express all social beliefs. For this 
reason, ethnomusicologists have argued that Salsa has a unique set of features 
that distinguishes it from other rhythms [1]. Despite the popularity of Salsa 
music, this music genre has not been formally analyzed in order to understand 
the components that define it and make it different from other rhythms. In the 
context of computer music, we are aware of only one research work (refer to [1]) 
developed in 2015 in which a dataset of around 25,000 Salsa songs was worked 
on without annotated musical information. In our work, we annotate a corpus 
with harmonic information of Grupo Niche and propose a set of rules that makes 
up the song structure very similar to the guideline bracketing for treebanks. 

In this work, we aim to the integration of computational linguistic in the art 
of music generation. We want to identify and analyze the features of this music 
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genre that is part of the Latin American folklore. Composers of Salsa music 
usually want to follow the patterns promulgated by popular bands and they can 
support their process by this system that generates automatic melodies. We like 
to make possible to generate Salsa music for supporting musical composition 
through probabilistic parsers. We are particularly interested in automatizing 
the Grupo Niche music which is a Salsa band founded in the seventies in Cali, 
Colombia [2]. 

In this paper, we attempt to tackle several subjects of linguistic computa¬ 
tional such as treebanks, probabilistic parsers, and induced grammars. Musi¬ 
cal composition can be proposed like a problem of rules rewrite in context-free 
grammars. Other approaches include the use of logic, hierarchical structures, 
constraint programming, Markov models, L-systems, and concurrency theories. 

This document is organized as follows: Sect. 2 presents related work, Sect. 3 
describes the Salsa music treebank, Sect. 4 explains music generation using prob¬ 
abilistic context-free grammars, and finally, we present experimental setting and 
results in Sect. 5 and conclude with some remarks and future work in Sect. 6. 

2 Related Work 

Stochastic processes may be applied to musical analysis, sound synthesis, and 
composition. Specifically, the n-dimensional property of the probabilistic gram¬ 
mars can model the four properties of sound: pitch, tone, volume, and rate [3]. 
The best type of grammar to represent music due to their ability to represent 
multi-leveled syntactic formations is the context-free grammar [4]. The similar¬ 
ity between Natural Language Processing (NLP) and music processing allows 
techniques from NLP (e.g. probabilistic context-free grammars) to be applied to 
music processing through hierarchical structures by fragments of melody [5] . 

On the other hand, general patterns in musical composition for generating 
new automatic music have been used [6,7] and particularly an implementation of 
a machine learning system from a treebank of monophonic melodies is described 
in [7]. In this connection, how to manually make the production rules based on 
standards of the Bach’s music with probabilities assigned and how to create a 
stochastic grammar to generate new melodies is described in [8] . These produc¬ 
tion rules use concepts such as musical areas to group the chords which are the 
terminals of the grammar. In order to know what information to annotate in a 
treebank of songs, three papers are found that use different ways to annotate a 
song through music theory concepts [9-11]. 

3 Salsa Music Treebank 

The musical treebank consists of twenty-eight syntax trees each one represent¬ 
ing a song restricted, naturally, by the composer (Grupo Niche) and the date 
it was composed (since 1980 to 1999), this is because of the evolution of Salsa 
music. For this purpose, it is necessary to carry out a process of gathering piano 
and bass scores in that the following compositions are obtained: Ana Mile, 
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A Prueba De Fuego, Busca Por Dentro, Cali Ajf, Cali Pachanguero, Canoa 
Rocha, Caso Social, Como Podre Disimular, Del Puente, Digo Yo, El Coco, 
Ese Dia, Etnia, Hagamos Lo Que Diga El Corazon, Han Cogido La Cosa, La 
Carcel, La Culebra, La Danza De La Chancaca, La Magia De Tus Besos, La 
Negra No Quiere, Listo Medellin, Me Sabe A Peru, Mi Pueblo Natal, Miserable, 
Nuestro Sueno, Se Parecio Tanto A Ti, Sin Sentimiento and Solo Un Carino. 
It is necessary to create a process, through concepts of music theory, to draw 
explicitly the sequence of chords for each song from the scores. 


3.1 Syntactic Functions of Nonterminal Symbols 

In music, every song can be divided into several parts concerning each section as 
popular songs have been traditionally split. These parts could be, for instance, 
introduction, verse, chorus, etc. In this way, every song is annotated dividing it 
into those parts [9]. This idea is taken and adapted to the context of Salsa songs. 
So, the first nonterminal symbols defined are the different elements of a Salsa 
song, that means anacrusis, introduction, verse, chorus, instrumental, pregon 
(that is particular and the most representative in Salsa) and coda. Introduction, 
verse, instrumental and pregon are found in all the songs that make up the 
treebank. On the other hand, the remaining (anacrusis, chorus, and coda) are 
not present in all those songs. All or some of these parts composes the initial 
symbol S (which symbolizes the whole song) in a specific order. 

In the same way, following an example of annotation proposed by Weyde 
and Wissmann [11], each element of the song is composed of musical cadences 
or areas in case that these do not appear in order of a common (predominant, 
dominant, tonic or dominant, tonic) or an Andalusian cadence (tonic, dominant, 
predominant, dominant) [12]. The musical areas in which a chord could be clas¬ 
sified are, precisely, predominant, dominant or tonic and in the case of cadence 
order, those would be grouped in a nonterminal symbol representing it. Those 
areas, in turn, are composed of one or more nonterminal symbols immediately 
preceding the terminal symbol (chord). The nonterminal of the area defines the 
number of chords that are in line continuously. Each chord is contained by a 
nonterminal symbol of the area corresponding to that chord. 

Besides this, new nonterminal symbols become necessary to represent the end 
of each element of the song in the sequence of chords. Thus, these nonterminal 
symbols are represented by the string “FIN” following the symbol of the element 
that is finishing. Each one leads to a period (Y) as the terminal symbol. 

One particularity of the anacrusis is that it always consists of one area dom¬ 
inant or tonic, as shown in Table 1. It also shows that each element finishes with 
the nonterminal symbol related to its end, which in turn always leads to a period. 


3.2 Terminal Symbols 

The terminal symbols are: T, Mi 5 , TIT, ‘iv’, TV7\ ‘V7’, ‘VI’, ‘VIE, Y. 

With this in mind, the terminal symbols are the chords in roman numerals, 
due to music theory, indicating the degrees of a scale. Some in uppercase (if it is 
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Table 1 . One-level depth examples of each nonterminal symbol 


Symbol 

Description 

Example 

S 

Initial symbol 

(S (INTRO ...) (VERSO ...) (CORO ...) (INS ...) 
(INS ...) (PREGON ...)(CODA ...)) 

ANA 

Anacrusis 

(ANA (AD (D ...) (D ...))) 

INTRO 

Introduction 

(INTRO (AT ...) (CD ...) (AT ...) (AD ...) 
(FININTRO ...)) 

VERSO 

Verse 

(VERSO (CD ...) (CD ...) (CD ...) (CD ...) (CD ...) 
(CD ...) (AD ...) (FINVERSO ...)) 

CORO 

Chorus 

(CORO (CD ...) (CD ...) (CD ...) (CD ...) (ASD ...) 
(FINCORO ...)) 

INS 

Instrumental 

(INS (AT ...) (CD ...) (AT ...) (AD ...) (FININS ...)) 

PREGON 

Pregon 

(PREGON (CD ...) (CD ...) (CD ...) (CD ...) 

(CD ...) (ASD ...) (FINPREGON ...)) 

CODA 

Coda 

(CODA (AT ...) (ASD ...) (AD ...) (FINCODA ...)) 

CD 

Common cadence 

(CD (ASD ...) (AD ...) (AT ...)) 

CF 

Andalusian cadence 

(CF (AT ...) (AD ...) (ASD ...) (AD ...)) 

ASD 

Predominant area 

(ASD (SD ...) (SD ...)) 

AD 

Dominant area 

(AD (D ...) (D ...) (D ...) (D ...) (D ...)) 

AT 

Tonic area 

(AT (T ...) (T ...) (T ...) (T ...) (T ...)) 

SD 

Predominant chord 

(SD ‘ii’) 

D 

Dominant chord 

(D ‘V7’) 

T 

Tonic chord 

(T T) 


a major chord) and some in lowercase (if it is minor), following the same theory. 
It is always a minor key because most Salsa songs are in that key. If a song is in 
a major key, it is translated to its relative minor. The IV and V chord have the 
seventh explicit because the scores have that chord so. In addition to this, the 
period has been added as a terminal symbol for the reasons set out above. 

Figure 1 shows the syntax tree related to the fragment shown in Fig. 2 of the 
song Cali Ajf which is in the treebank transcribed in A minor key the relative of 
C major key. This is an example of the annotation that has been given to every 
song concerning the following bracket annotation: 

(S (ANA (AD (D VII) (D VII) ) ) (INTRO (CD (AD (D VII) (D VII) (D VII) 

(D VII) ) (AT (T III) (T III) (T III) (T III) ) ) (CD (AD (D VII) (D VII) (D 

VII) (D VII) ) (AT (T III) (T III) (T III) (T III) ) ) (AD (D VII) (D VII) (D 

VII) (D VII) ) (FININTRO .) ) (CORO (CD (AD (D VII) (D VII) (D VII) (D 

VII) ) (AT (T III) (T III) (T III) (T III) ) ) ) ) 
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ANA 

I 

AD 

D D 
I I 
VII VII 



D D D DTTTTD D D DTTTT VII VII VII VII DDDDTTTT 

I I I I I I I I I I I I I I I I I I I I I I I I 

VII VII VII VII III III III III VII VII VII VII III III III III VII VII VII VII III III III III 


Fig. 1 . Syntax tree of a piece of the song Cali Ajf. 


PIANO 


CALI AJI 

G NICHE 



Cv _ _ _ 

14 I 1,2,3 n 


Fig. 2. Piano score of a piece of the song Cali Aji. 


4 Composing Salsa Using Probabilistic Context-Free 
Grammars 

In this section, we present a music generation model based on induced proba¬ 
bilistic context-free grammars (PCFG). We take the chords of the induced trees 
by the grammar and automatically generate a song’s melody. Figure 3 shows a 
diagram of the music generation model. First, we train an induced grammar on 
the Grupo Niche Treebank and produce a set of production rules which is a 
PCFG. Second, we implement a chords generation algorithm which is provided 
by PCFG and obtain the sequence of chords. Finally, we implement a melodies 
generator using the sequence of chords considering tempo and tonality features. 
Following we explain in detail each element of the model. 


4.1 Computational Model Based on PCFG 

Before you explain induced grammars, we define a context-free grammar (CFG) 
as a set of production rules which describes strings that belong to a language and 
are syntactically valid. A CFG is a tuple G = (TV, 17, P, S'), where P is a set of 
the production rules, TV is a non-terminal symbol, E is a set of terminal symbols 
and finally, S is the initial non-terminal symbol. The production rules follow 
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Fig. 3. Diagram of the development of the computational model 


the form A —> X, where A E TV, and X E (X U 7V) + . A PCFG is a context- 
free grammar that has associated a probability distribution in P, where each 
production rule has associated a probability q(a —► /3) for each rule a —> /3 G P. 
For any I G iV, we have the constraint: 

^(Q! ^ /?) = 1 

a^/3£P:a=X 

In addition we have > /?) > 0 for any —> f3 G P. 

When we have a PCFG induced from a treebank, the probability of each 
production rule ^ is estimated using the maximum-likelihood estimation: 


p(a 


0 ) 


Count(a —> /?) 
Count(a) 


Where Count (a —> /?) is the number of times that the rule a ^ (3 is seen in 
the treebank, and Count(a) is the number of rules that the non-terminal a is 
seen on the rule left-hand side in the treebank. 
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In more detail, the algorithm that describes that process of estimating the 
probabilities of occurrence is: 

pcount = {} 
lcount = {} 

for each prod in productions: 

lcount[prod.lhs()] = lcount[prod.lhs()] + 1 
pcount[prod] = pcount.get[prod] + 1 

prods = [] 

for each p in pcount: 

prods = prods + ProbabilisticProd(p.lhs(), p.rhsO, 
prob=pcount [p] / lcount[p.lhs()]) 

return prods 

First, given the treebank of Grupo Niche songs, the rules are induced from 
it by a top-down cyclic method through all the syntax trees. Once we have 
the production rules, parsing is carried out by estimating the probabilities of 
occurrence from the list of productions as mentioned above. 

The induced grammar fulfills the characteristics of a PCFG. Thus, the prob¬ 
abilities of all choices of a nonterminal symbol must sum 1.0. Some induced 
production rules are shown below. According to assigned probability, the algo¬ 
rithm can choose one particular nonterminal symbol of a production rule (for 
example, in the production rule S shown below) or another nonterminal symbol, 
and a similar process with the other symbols. 

S -> INTRO VERSO CORO INS VERSO CORO INS PREGON INS PREGON INS CODA 
[0.0357143] 

PREGON -> CD CD CD CD CD FINPREGON [0.0434783] 

CD -> AD AT [0.516279] 

CD -> ASD AD AT [0.483721] 

AD -> D D D D [0.233561] 

D -> f V7 ; [0.531322] 

FINPREGON -> [1.0] 


4.2 From the Model to the Music 

As shown in Fig. 3, given the production rules, we do a walkthrough top-down by 
the grammar until generating a sequence of terminal symbols. Namely, through 
a recursive method that chooses a single rule among all the options of a nonter¬ 
minal symbol, starting with the initial symbol S. If the choice leads to a sequence 
of nonterminal symbols, this process is repeated with each symbol to choose a 
rule among its options. On the contrary, if the choice leads to a terminal symbol, 
this branch is finished and the terminal symbol is appended to a list which is 
finally returned. In more detail, the algorithm is: 


www.ebook3000.com 


368 


B. Rodriguez et al. 


generate_chords(grammar, items, terminals): 
for each item in items: 

if item is NonTerminal: 

prods = grammar.productions(lhs=item) 
probs = [] 

for each prod in prods: 

probs = probs + prod.probO 
chosen_prod = choose_prod(prods, probs) 
generate_chords(grammar, chosen_prod.rhs(), terminals) 
else: 

terminals = terminals + item 
return terminals 


Following the diagram of the development, the sequence of chords requires 
a process to be the melodies of an automatic song. To achieve this goal, it is 
necessary to choose the instruments that are going to take part in this process. 
The representative non-percussion instruments of Salsa music genre chosen are 
electric bass and piano because these are considered, with the percussion instru¬ 
ments, the basis of the music genre. 

Specifically, to generate the piano melodies, the rhythm is divided into four 
variations which are cyclically assigned to every chord in the sequence. The 
chords are set from the second one of them to the last one. Before this, the first 
chord is assigned a rhythm variation that is specially designed for this. It should 
be noted that every rhythm variation represents a half measure. On the other 
hand, to generate the electric bass melodies, the same process is done but only 
with two variations that are assigned to the sequence of chords without including 
the first chord. In order to generate these melodies, it is necessary to previously 
set the tonality and the tempo, as shown in Fig. 3. 

In order to complement the song (i.e. makes it sound more like Salsa) we 
add loops of some percussion instruments (cowbell, congas, clave, and maraca) 
to the melodies mentioned above. 

The process of getting the melodies from the sequence of chords is imple¬ 
mented in Python language programming by the GNU software Lilypond, which 
allows translating the musical language to a programming language and gener¬ 
ates in MIDI format the melodies. 


4.3 Practical Tool for Supporting Composers 

In order to visualize the possible outcomes of the model, we implement a web- 
based tool to make it intuitive to users. We are convinced that music composers 
may be interested in the patterns that the Grupo Niche used in their songs. 
We want to highlight the Python library that we use for supporting the NLP 
processes called Natural Language Toolkit (NLTK). 

As shown in Fig. 4 the web-based tool needs the tonality and the tempo of 
the composition. These are parameters that the user must input. After the user 
clicks on “Generate” button, the model built a song based on the parameters. 
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r i 

Tonalidad: 


(AnaAT. Intro. Coro. Ins. Coro, Verso, Coro. Verso. Coro, Ins. Coro. 
Pregon. Ins. Coro. Ifl^oda) 


Descargar Partitura del Bajo Electrlco. 
Descargar Partitura del Piano. 


Fig. 4. Graphic user interface of the web-based tool 


The song structure is shown in a web-player and the user can download the bass 
and piano scores of the song. 

5 Experiments and Results 

In this section, we explain the experiments performed consisting of three gener¬ 
ated automatic songs in order to analyze them in terms of their chord progres¬ 
sions and their hierarchical structures. These songs are different to the annotated 
songs in the treebank. It is important to note the lack of music scores in Salsa 
music makes it difficult to gather of scores for the experiments. 


5.1 Precision and Recall 

We use Precision and Recall measures, as defined in [13]. There is no a formal 
technique which allows interpreting these measures in the context of large sized 
syntax trees (around 300 leaf nodes and TOO interior nodes in each tree). The syn¬ 
tax trees used in NLP generally have up to 30 leaf nodes. In our case, performance 
measures are adapted to evaluation of automatically generated songs and we use 
precision for looking at the human-composed songs, instead of evaluating the 
parsing of a sequence of terminals. We evaluate three sets of human-composed 
songs: Songs composed by Grupo Niche in the treebank, songs composed by 
Grupo Niche which are not in the treebank and Salsa songs that are not com¬ 
posed by Grupo Niche. 

Tables 2, 3 and 4 show the results of the process of contrasting taking one 
automatic song. Taking the highest number between precision and recall because 
it only depends on the tree of which song the automatic or the human-composed 
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Table 2. Results of applying precision and recall measures comparing an automatic 
song with every song in the treebank 


Song’s name 

Precision 

Recall 

Ana Mile 

20,190 

11,756 

A Prueba De Fuego 

22,565 

13,085 

Busca Por Dentro 

24,940 

15,395 

Cali Aji 

24,940 

16,587 

Cali Pachanguero 

27,315 

17,215 

Canoa Rocha 

26,840 

14,790 

Caso Social 

24,465 

13,419 

Como Podre Disimular 

25,890 

18,136 

Del Puente 

25,653 

10,577 

Digo Yo 

25,653 

12,705 

El Coco 

26,128 

18,425 

Ese Dia 

19,477 

9,692 

Etnia 

20,665 

10,369 

Hagamos Lo Que Diga El Corazon 

23,752 

16,393 

Han Cogido La Cosa 

23,515 

12,547 

La Carcel 

20,427 

10,449 

La Culebra 

23,040 

14,741 

La Danza De La Chancaca 

26,365 

14,471 

La Magia De Tus Besos 

22,565 

15,422 

La Negra No Quiere 

28,266 

13,134 

Listo Medellin 

28,741 

13,781 

Me Sabe A Peru 

22,802 

14,814 

Mi Pueblo Natal 

20,902 

17,886 

Miserable 

24,940 

15,601 

Nuestro Sueno 

21,378 

13,081 

Se Parecio Tanto A Ti 

21,852 

13,294 

Sin Sentimiento 

18,289 

12,520 

Solo Un Carino 

22,565 

12,195 


Table 3. Results of applying precision and recall measures comparing an automatic 
song with Grupo Niche songs that are not in the treebank 


Song’s name 

Precision 

Recall 

A Mi Medida 

15,914 

11,571 

Las Mujeres Estan De Moda 

23,040 

15,695 

Mi Valle Del Cauca 

27,078 

18,269 

Ni Como Amiga 

23,040 

15,299 
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Table 4. Results of applying precision and recall measures comparing an automatic 
song with Salsa songs that are not composed by Grupo Niche 


Song’s name 

Precision 

Recall 

Ahora Quien 

13,438 

8,959 

Arrepentida 

18,281 

25,167 

Te Voy A Ensehar 

16,223 

19,198 


has more interior nodes, the general results with three automatic songs shows 
that the numbers are in a range from 17 to 30 and on the whole, the numbers of 
the tables related to the songs that are not composed by Grupo Niche are lower 
than the numbers of the other tables, but not much. 

Based on the small range of the numbers of the tables related to the songs 
in the treebank, it follows that the nodes of the syntax tree of the automatic 
songs are built in almost equal proportions from every song in the treebank. 
That means a harmonic similarity between these two groups. Further, because 
there is a small difference between both the numbers of the tables related to 
compositions of Grupo Niche and the numbers of the remaining tables, there 
is a little difference in their harmonic structure but most harmonic patterns 
remain in both groups. Due to the similarity of the numbers of the songs in 
the treebank and the songs composed by Grupo Niche that are not in it, the 
grammar is well-formed because that means that it covers a large number of the 
harmonic patterns promulgated by Grupo Niche. 

In the context of the results obtained by performance measures, we deduce 
that the song structure of an automatic song is always an order of one or more 
songs in the treebank, which means that the automatic songs have an order used 
by Grupo Niche. Similarly, the cadences and the musical areas are organized 
inside every part like in one or more songs in the treebank. Thus, the annotated 
musical patterns in the automatic songs are the same that this band used in one 
or more songs. This is why the musical analysis of the experiments shows that 
the chord progression of each experiment follows the general musical rules which 
are established in this music genre and specifically in the Grupo Niche songs. 
The rhythm variations of the instruments are selected from several songs in the 
treebank. 

6 Concluding Remarks and Future Work 

We implemented a musical formal system that, through NLP formal techniques, 
generates music. Because it is not known a formal procedure to assess syntax 
tree with such width as described above, the data was analyzed in a way that 
allowed us to get information from it. Precision and recall measures were used 
and analyzed for that purpose. 

The absence of the background documents about computational models in 
the context of Salsa music genre resulted in the lack of an annotated treebank or 
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a standard of music annotation. Furthermore, the lack of head rules affected the 
quality of the induction process because of the difference between the function 
of a chord inside a song and the function of a word in a sentence. 

Inducing a grammar from a treebank has resulted in an accurate way to 
generate music following the musical patterns that are written in the treebank 
and it is also subject to analysis and evaluation. Broadly this project does three 
important contributions in the academic field: A treebank with Salsa songs, a 
generative grammar of Salsa music genre and the analysis of Grupo Niche songs 
and its support to musical composition. 

We plan to continue this work in three directions: Extend the range of the 
instruments considering brass instruments because of their melody and their 
contribution to Salsa music; head rules implementation on musical grammars 
will offer a very high degree of sophistication; and the adaptation of computation 
techniques such as evolutionary computation and artificial intelligence which will 
always be suitable for this work. 
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